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The permanent revolution 


To rediscover its glorious scientific past and build a knowledge-driven economy, Russia must 
break old habits and loosen state control on research. 


financial drought — and despite the casual disdain for all 

things intellectual shown by the profit-crazed oligarchy who 
have become Russia's elite — research is reclaiming its place as one 
of the country’s most noble institutions. 

Much of the credit for this improved situation must go to Andrei 
Fursenko, the science and education minister in the government of 
Prime Minister Vladimir Putin. Fursenko, a physicist trained at the 
prestigious loffe Institute in St Petersburg, understands how modern 
science works, and knows where and why the Russian research system 
is in disorder. Not everything he does pleases the Russian academic 
establishment. But this in itself can be considered an endorsement of 
Fursenko’s approach, given the establishment's inclination to recycle 
the past rather than turn to modern conventions such as international 
peer review and scientific competition. 

Among the most visible signs of the improved health of science 
in Russia, and of Fursenko’s guiding hand, are the government 
programmes set up to establish cutting-edge research at Russia's 
long-neglected universities. These focus in particular on efforts to get 
experienced Western scientists to do research at Russian university 
labs through the ‘mega-grant’ programme, launched last year. 

Russia being Russia, Fursenko’s efforts have tended to get bogged 
down by the state's bureaucratic superstructure, to which science and the 
freedom to pursue it mean very little. As we report on page 17, the most 
recent example of this is the stalling of a prominent German-Russian 
mega-grant project to study carbon flux in the environment, which 
came to a halt on the command of Russia’s security services. In this case, 
Fursenko seems to have won the battle — the project will go ahead, but 
institutional barriers to collaborative projects remain. Western scientists 
and companies are learning the hard way that over-regulation in Russia 
is a different beast to the red tape they encounter at home. 

The purchase, import and export of equipment and samples require 
federal security approval that can be grindingly difficult to obtain. 
Federal security services need not justify nor explain their rulings. 
There is no formal way to appeal even obviously offhand decisions, 
and it is downright impossible for grant holders to communicate with 
local or federal officers in charge. At lower administrative levels, brib- 
ery is yet to be properly addressed, and officials’ insistence that every 
piece of research equipment is purchased through designated Russian 
agencies (usually at inflated prices) borders on institutional corruption. 

Faced with this situation, foreign scientists given mega-grant 
projects could be forgiven if they elected to do research and spend 
grant money in their home countries, rather than at the Russian host 
institutes. This undermines one of the programme's main aims — to 
bring Russian students and young scientists into contact with high- 
profile international science early on in their careers — and threatens 
to diminish its effect on the modernization of Russian science. 

Fursenko cannot change the system alone, but must continue to do 
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what he can. All scientists who are participating in the current round 
of the mega-grant programme, for example, will need clear instruc- 
tions on deadlines and approval procedures for their projects. And 
there must be guidance on which formal responsibilities lie with the 
grant holder, and which ones lie with the host institute. 

If Russia is serious in its ambition to develop a knowledge-driven 
economy, it must substantially reduce the level of state control on 
research and development. It has given science a helping hand, 
but — as Fursenko seems to know and as Putin must also under- 
stand — further progress needs freedom. m= 


Reality check 


Who’d be ascientist? As funding levels fall and 
competition rises, no one seeking leisure. 


at the Fred Hutchinson Cancer Research Center in Seattle, 
Washington, who researches the molecular virology of HIV, 
advocates the need for labs that allow their researchers a fulfilling life 
outside the lab (page 27). Conversely, Alfredo Quifiones-Hinojosa, 
a stem-cell neurologist and surgeon who heads the brain-tumour 
programme at Johns Hopkins University in Baltimore, Maryland, 
drives himself and his lab members as close to a 24/7 working life as 
is humanly possible (page 20). What might a young scientist make 
of these two styles, apart from the observation that it takes all sorts? 
The necessity for hard workin science has long been emphasized. In 
his classic Advice to a Young Scientist, published in 1979, Peter Medawar 
emphasized the competitiveness of science and the inevitable concerns 
about priority. He also issued a golden rule: if you want to make impor- 
tant discoveries, choose an important problem. However, such prob- 
lems add up to a recipe for perpetual hard work: important problems 
not only attract the most ambitious scientists but also present risks 
and false steps in the innovative approaches required to address them. 
Overbaugh is right to highlight a need for time away from the bench 
or computer for creative reflection. Lab heads also need to ensure that 
their younger lab members maintain a sense of autonomy rather than 
of cog-in-the-machine. And young scientists applying for posts must 
understand what sort of lab head they are dealing with. But many older 
folk wistfully recall their early postdoc careers, when they had one or two 
clear challenges to focus on late into the night, and over weekends too. 
As research funding declines in many countries, science will intensify. 
Anyone lacking the inner intellectual drive and a capacity for relentless 
focus to get to the heart of the way the world works should stay away. = 


Ts contrast could not be greater. Julie Overbaugh, a lab head 
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months of conflict has once again focused minds on the pros- 

pect for fundamental change in the Arab world. Science is not 
ahigh priority for countries that have just rid themselves of dictators, 
but in the wake of the uprisings and protests it is natural for researchers 
in those nations and colleagues abroad to see opportunities to improve 
the generally abysmal state of science in Arab countries. 

Significant change is unlikely soon. Six months on from the first 
events of the Arab Spring, there have been no concrete improvements 
for scientists here in Jordan, and I get the same impression from col- 
leagues in Egypt and Tunisia. The kind of change needed to improve 
the state of science takes a long time. It is about rebuilding institutions, 
providing the right environment, obliterating former habits, disman- 
tling bureaucracies, changing mentalities and re-educating people. 
Such change will take a generation. 

One positive thing that I do see and feel is the general attitude of 
the people, who are more optimistic that things 
will change for the better. And officials are more 
reluctant now to exploit and abuse their positions, 
as they are more likely to be held accountable for 
their actions. Although the outside world may see 
headlines about fancy projects such as the build- 
ing of new institutions, the change to science 
required in Arab countries is not about bricks- 
and-mortar improvement but about building 
intellectual capacity. 

There is no lack of minds in the Arab Mus- 
lim world, as shown by the many Arab Muslim 
scientists from these regions doing great sci- 
ence in Western universities. The problem is the 
environment, which fails to sustain creativity, 
curiosity and striking out into the unknown — 
all of which are essential for science to flourish. 
Such an environment is created only through experience. The whole 
community must experience the need to find the solutions. And this 
requires freedom. Dictatorships in Arab Muslim countries thrive on 
ignorance and fear, and this lack of freedom filters through the whole 
community, affecting not just political life but reaching down to the 
household itself — to how parents deal with their children. 

I strongly believe that an essential first step towards freeing minds 
from the habits of the past is to plant the love of reading in our young 
children. In this way, they revisit other people’s experiences across time 
and space, learn that there are other ways of living, and develop respect 
for other perspectives. When children read, their horizons expand 
and they build the confidence to face challenges, 
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The Arab Spring offers 
hope but no quick fix 


Revolutions in Libya and elsewhere have raised hopes for science in the Arab 
world. But progress will be slow, cautions Rana Dajani. 


Arab world by training women to read aloud to children in their local 
neighbourhoods. 

Through donations and discounts from publishers, we have set up 
100 children’s libraries across Jordan, and the concept is spreading across 
the region. By next year, some 20,000 children will have benefited. 

Children learn to form their own opinions on the basis of reasoning 
and deduction. But this requires practice, and this is what we need 
to encourage. Many students at my university have never formed an 
independent opinion that reflects their own original thinking. The 
day I got my students writing essays to express themselves was the day 
one student told me that he felt human, that he was Someone with a 
capital S. These are the people who will build our communities and 
nations, who will make a difference, who will take us into the twenty- 
first century with confidence and progress. 

Alongside this cultural shift, we also need to assess carefully the 
relationship between Islam and science, particularly in fields with an 
ethical content, such as stem-cell research. Ethi- 
cal guidelines for bioengineering and biomedical 
science for the Muslim world must be drawn up 
by committees that include scientists, physicians, 
Islamic scholars and Arabic language specialists. 

We have established such a committee at my 
university, and our discussions indicate that 
stem-cell research is permissible in Islam, as long 
as it is carried out with the purpose of improv- 
ing human health. This conclusion must be 
re-examined as the field advances. 

Such a multidisciplinary approach, new to 
the Islamic world, is essential to challenge stag- 
nant thinking based on literal interpretations of 
Islamic sources. The Koran is not a book of scien- 
tific facts. It contains verses that describe worldly 
phenomena, but these are presented as evidence 
of the elegance and simplicity of creation. Islam is a spiritual guide to 
life. It teaches us how to live in harmony with ourselves, our fellow 
humans and the world. There is no conflict between Islam and science. 

Islam asks us to use our minds to explore the world around us. It calls 
for the use of scientific methodology and logic in our approach to sci- 
ence. The verses of the Koran are interpreted by humans, and humans 
are limited by the scientific knowledge of the era in which the verse was 
interpreted. The path ahead is not easy, and change will not happen 
overnight. Still, the Prophet Muhammad said: “Do not belittle any 
act of good.” 

In Libya and elsewhere, we can reasonably hope that we have seen 
acts of good. m 


Rana Dajani is assistant professor of molecular biology at the 
Hashemite University in Zarqa, Jordan. 
e-mail: rdajani@hu.edu.jo 
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POLICY 


Turkish decree 


The Turkish government 
unexpectedly shook up 
national research politics on 

27 August with a decree that 
gives it the power to nominate 
the president and vice- 
presidents of the previously 
autonomous Turkish Academy 
of Sciences. The government 
will also appoint four members 
of the academy’s 14-strong, 
decision-making council. The 
decree requires the academy to 
establish and finance a series of 
new basic-research institutes, 
and enables the government 

to nominate top personnel 

in TUBITAK, the Turkish 
research-funding agency. 


Dollar disclosure 
On 23 August, the US National 
Institutes of Health (NIH) 
unveiled its new policy for 
disclosing financial conflicts 
of interest. In the interests of 
transparency and public trust, 
the guidelines impose stricter 
reporting requirements on both 
federally funded researchers 
and their institutions. But 

the NIH confirmed that it is 
backing down from an earlier 
proposal to require disclosure 
of conflicts of interest on a 
public website, and is instead 
allowing institutions to make 
the information available on 
request. See go.nature.com/ 
oce4n3 for more. 


Greek reform 


On 24 August, Greece's 
parliament passed sweeping 
reforms to higher education 
that aim to modernize 
universities and make it 
easier for Greek scientists 
working abroad to return (see 
Nature 475, 13-14; 2011). 
The parliament also agreed 

to abolish a university asylum 
law intended to stop police 
intervening in academic 
affairs. The law prevents police 
from entering university 


Virginia quake deals seismic surprise 


A magnitude 5.8 earthquake in rural Virginia 

on 23 August caused disruption across broad 
swathes of the eastern United States, where it was 
the most significant tremor in a century. Major 
earthquakes are rare in the region because its 
crust is old and mostly stable, but when quakes 
do occur, the strong crustal rock transmits 
seismic waves with relatively little loss of energy, 


campuses, but has long been 
exploited by criminals. It 
was introduced in 1974 after 
the fall of the Greek military 
dictatorship, which had 
brutally suppressed a student 
uprising in Athens in 1973. 


US energy audit 


An audit report from the 

US Department of Energy's 
inspector-general has 
criticized the branch of the 
department that specializes in 
funding high-risk, high-pay- 
off research. The 22 August 
report says that the Advanced 
Research Projects Agency- 
Energy (ARPA-E) lacked 
policies to ensure oversight 
and monitoring of awards 
worth US$368.6 million, many 
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of them made from funds 
received through the 2009 
stimulus bill. ARPA-E says it 
is remedying the situation. 
But the report may fuel 
political concern: it comes 
just three months after the 
House of Representatives 
recommended cutting the 
agency’s $180-million budget 


by 45% for 2012. See go.nature. 


com/qkojko for more. 


Arctic push 

Denmark last week 
announced a ten-year strategy 
for its priorities in the Arctic, 
declaring that, together with 
Greenland and the Faroe 
Islands, it would welcome 
industrial development in the 
region, but also respect the 


so they can cover vast distances. Shaking was 
felt along the eastern seaboard, from Florida to 
Nova Scotia in Canada; and the quake caused an 
automatic shutdown at the nearby North Anna 
nuclear power plant in Mineral, Virginia. There 
was some damage to buildings in urban centres, 
including Washington DC (its cathedral is 
pictured). See go.nature.com/sgugvi for more. 


Arctic’s fragile environment. 
Denmark will need to both 
cooperate and compete with 
its Scandinavian rivals in 
Arctic development, as well as 
with Russia, Canada and the 
United States. 


Ocean drilling 


Just two months after the 
26-nation Integrated Ocean 
Drilling Programme (IODP) 
released a new decadal 
science plan, the United 
States has said it cannot afford 
to take part in the research 
consortium after 2013. The 
decision was announced last 
month by the US National 
Science Foundation in a letter 
to the IODP community, 

and reported in Science last 


J.S.APPLEWHITE/AP 


NASA/NOAA GOES PROJECT 


SOURCE: LANCET 


week. Yet it was not entirely 
unexpected — worries had 
been building since the spring, 
when it emerged that rising 
fuel costs were limiting the 
activities of the US drilling ship 
JOIDES Resolution (see Nature 
473, 137; 2011). IODP partners 
are now discussing whether a 
less-costly cooperation might 
allow US scientists to take part 
in international ocean-drilling 
projects after 2013. 


| RESEARCH 
Exploding star 


The closest type Ia supernova 
in nearly 40 years has been 
spotted in the spiral galaxy 
M101 by astronomers at 

the Palomar Observatory in 
California. As the brightest 
and most energetic kind of 
stellar explosion, it can be used 
to measure the accelerating 
expansion of the Universe. 
Astronomers are now 
scrambling to study the event 
as it brightens over the coming 
weeks. See go.nature.com/ 
oraf5x for more. 


Ethics report 

US Public Health Service 
researchers knew that they 
were acting unethically when 
they exposed hundreds of 
Guatemalans to sexually 
transmitted diseases in the 
1940s without the subjects’ 
consent, according to the 
Presidential Commission for 
the Study of Bioethical Issues. 


TREND WATCH 


Surveys in the United Kingdom 
and the United States suggest that 
the rate of increase in obesity is 
slowing. Even so, the prevalence 


of obesity in US adults is set 
to grow from around 32% in 


2007-08 to 50% in 2030 for men, 
and from 35% to 45% for women. 
The projections were published 
on 27 August (Y. C. Wang ef al. 


Lancet 378, 815-825; 2011). 


The study also estimates that by 
2030, the annual cost of treating 


obesity-related diseases in the 


United States will have risen by 


US$48 billion—66 billion. 


The commission will now 
assess whether current research 
rules protect trial participants 
from similar abuses, and report 
its conclusions in December. 


EVENTS 


Storm damage 


The hurricane season has 
definitely arrived. Torrential 
rain, flash floods and 
140kmh‘ winds left a trail of 
destruction along the east coast 
of the United States this week. 
Irene (pictured) may have been 
downgraded from hurricane 
status to a category 1 storm 

by the time it made landfall in 
North Carolina, but it still left 
millions without power and 
caused more than 40 deaths as 
it moved northwards. Irene was 
the ninth named storm of the 
season; forecasters predict up 
to 19 this year. 


Space failure 


The next manned mission to 
the International Space Station 
will be delayed by a month, 


THE OBESITY TIDE 


to October, following the loss 
of a Russian cargo capsule 
carrying fresh supplies into 
orbit. The launch vehicle, a 
Soyuz-U rocket, was lost after 
its third stage failed around 

5 minutes after lift-off from 
the Baikonur Cosmodrome 

in Kazakhstan. The failure 
raises questions about quality 
control in the Russian space 
programme: in August, a 
Russian telecommunications 
satellite aboard a Proton rocket 
was delivered to the wrong 
orbit. See go.nature.com/Ifp5zl 
for more. 


PEOPLE 


Research head 
Donald Dingwell is the 

new secretary-general of 

the European Research 
Council (ERC), Europe's 
competitive funding agency. 
The geoscientist, currently 
at Ludwig Maximilian 
University in Munich, 
Germany, will act asa liaison 
between the ERC and the 
European Commission. He 
fills a year-long vacancy left 
after the departure of the 
previous incumbent, Andreu 
Mas-Colell. 


Iran assassin 


The alleged assassin of an 
Iranian particle physicist 
killed by a bomb explosion 
last year pleaded guilty ina 
Tehran court on 24 August. 
Majid Jamali Fashi confessed 


Projections suggest that more than 50% of US male adults are likely 


to be obese by 2030. 
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SEVEN DAYS | THIS WEEK | 


2-3 SEPTEMBER 
Nature.com and Digital 
Science hold their fourth 
annual Science Online 
London conference to 
explore how the web is 
changing science. 
www.scienceonlinelondon.org 


8 SEPTEMBER 
NASAs two GRAIL 
spacecraft launch for the 
Moon, where they will 
fly in tandem to map its 
gravitational field. See 
page 16. 
go.nature.com/msewft 


to setting up the bomb that on 
12 January 2010 killed Masoud 
Alimohammadi on his way 

to work at the University of 
Tehran (see Nature 463, 279; 
2010). Prosecutors said that 
Israel's intelligence agency 
Mossad was ultimately behind 
the murder — but observers 
worry that the confession 

was a show trial. Another 
Iranian physicist was killed 

last November, and an Iranian 
electrical engineering student 
was murdered in July. See 
go.nature.com/tukvtk for more. 


Researcher returns 
Wildlife biologist Charles 
Monnett of the US Bureau of 
Ocean Energy Management, 
Regulation and Enforcement 
(BOEMRE) went back to 
work on 26 August after a 
six-week suspension triggered 
by an investigation by the US 
Department of the Interior’s 
inspector-general. BOEMRE 
says that Monnett, who in 
2006 first observed polar bears 
that had apparently drowned 
while searching for sea ice, 
remains under investigation. 
He will now do environmental- 
assessment research rather 
than return to his former role 
managing research contracts. 
See go.nature.com/ebvtvb 

for more. 


> NATURE.COM 
For daily news updates see: 
www.nature.com/news 
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keep the lights on in its 


Yoshihiko Noda is Japan’s sixth prime minister in five years, replacing Naoto Kan after his resignation. 


Japan’s new leader 
faces energy gap 


Power policy remains in flux. 


BY DAVID CYRANOSKI IN TOKYO 


oshihiko Noda, Japan’s new prime 

Y minister, has a daunting in-tray to 
deal with: a faltering economy, a huge 
reconstruction effort following the devastat- 
ing earthquake and tsunami in March, and an 
ongoing nuclear emergency at the Fukushima 
Daiichi power plant. Elected on 29 August 
after Naoto Kan’s resignation last week as 
leader of the ruling Democratic Party of Japan, 


Noda must also contend with the threat of a 
complete vacuum where the country’s energy 
policy should be. 

After the Fukushima disaster, Kan vowed to 
“leave nuclear energy behind’, but made no clear 
plans to fill the energy gap. And neither Noda 
nor the other leadership contenders articulated 
aclear position on energy during the brief elec- 
tion campaign. When pressed on the issue, 
they steered clear of strong statements about 
the future of nuclear power or how they would 


absence. “We have no 
energy policy,” says Tat- 
suo Oyama, who studies 
investment in power 
utilities at the National 
Graduate Institute for 


Policy Studies in Tokyo. 2 WWW.NATURE. 
“It’s a serious issue that COM/JAPANQUAKE 
Noda has to deal with.” 


Last year, Japan committed to constructing 
14 new reactors so that nuclear power would 
provide half of the country’s electricity by 2030. 
But after the accident at Fukushima, public 
mistrust of the technology prompted Kan to 
ditch the plan. He also ordered existing reac- 
tors to be taken offline for a series of stress tests, 
in addition to normal inspections. All but one 
of the country’s 54 existing reactors are set to be 
shut down by May 2012, removing a quarter of 
Japan's power capacity. Restarting them would 
require approval from local governments, 
which they may be reluctant to grant. 

Kan had promised that renewables and effi- 
ciency measures would make up the shortfall, 
but did not explicitly say how. “He made a 
change but he did it without debate and without 
a road map,” says Tsutomo Toichi, an energy 
specialist at the Japanese Institute of Energy 
Economics in Tokyo. “Tt was very abrupt?” 

The short-term consequence has been a 
jump in the use of fossil fuels, including an 
estimated 20% increase in imports of expen- 
sive liquefied natural gas. Toichi says that 
there could be a 20% increase in the cost of 
producing electricity in 2012 if nuclear reac- 
tors are shut down as expected, probably 
leading to higher rates for consumers. This 
week, the Tokyo Electric Power Company 
announced that it is considering a 10% hike 
in next year’s electricity rate. Citizens’ will- 
ingness to economize has trimmed power 
demand in Japan, but Oyama believes that the 
sagging economy has also reduced electricity 
consumption. Hiromasa Yonekura — chair- 
man of Sumitomo Chemical, headquartered 
in Tokyo and Osaka, and head of the powerful 
Japan Business Federation (Keidanren) — has 
repeatedly warned that high prices could force 
companies to move operations overseas. 

To prevent this, Toichi says that the new 
government must get the nuclear plants back 
online as soon as possible. Despite public 
opposition, Noda has hinted that nuclear will 
continue to make up part of Japan’s energy D> 
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> mix, and industry minister Banri Kaieda 
announced for the first time this week that 
he expects some reactors to restart this year. 

That would be a step in the wrong direc- 
tion, says Tetsunari lida, director of the 
Tokyo-based Institute for Sustainable Energy 
Policies, who wants the country to seize the 
opportunity to invest in renewable energy. 

On 12 September, lida will launch the 
Japan Renewable Energy Foundation, which 
is backed by ¥1 billion (US$13 million) 
from Japan's richest man, telecoms mogul 
Masayoshi Son. The foundation will bring 
together some 100 experts from around the 
world to analyse obstacles to implementing 
renewable energy, and offer policy recom- 
mendations to the new government. 

The foundation’s cause did get a boost 
from Kan, who had agreed to step down 
only if Japan’s parliament passed a bill to 
support clean energy. The bill, which passed 
on 26 August, will next year guarantee a 
minimum price for wind, solar and other 
renewable energies that will make them 
more attractive for suppliers to invest in. 

lida is confident that this will help to 
sustain the shift away from conventional 
power sources. He notes that, from 2008 to 
2010, Japan’s annual increase in solar-power 
capacity jumped from 230 megawatts to 
almost 1 gigawatt, giving the country a total 
capacity of 3.6 gigawatts. He believes that 
the renewables bill will make huge solar 
farms economically attractive for the first 
time. “I think next year we'll see a tenfold 
jump in solar and fivefold in wind, he pre- 
dicts. Many say that this is unrealistic. “It’s 
impossible within two years, and too ambi- 
tious within ten,” says Toichi. 

A bill on global warming could also boost 
renewables by preventing Japan from filling 

its energy gap with 


“Thereisnowa fossil fuels. But the 
broad consensus _ bill’ prospects are 
that we need dim, even though 
to reduce our its aim — a legally 
reliance on binding target 


to reduce green- 
house-gas emis- 
sions to 25% below 1990 levels — was once 
a cornerstone of the Democratic Party’s 
manifesto. “Because of the reduction in our 
reliance on nuclear power, it will be very 
difficult” to hit that target, says Toichi. 

Japan’s parliament has so far declined to 
consider the bill, and political support for it 
is waning rapidly. “The new majority posi- 
tion [within the Democratic Party] is to get 
rid of the 25% target,’ says lida. 

Yet despite the uncertainties that now 
cloud Japan’s future, Kan’s administration 
did at least deliver one clear achievement 
on energy policy, says lida. “There is now 
a broad consensus that we need to reduce 
our reliance on nuclear power. That is the 
atmosphere now.’ = 


nuclear power.” 
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Variation among Arabidopsis strains can reflect genetic adaptation to their local habitat. 


Halfway point for 
1,001 genomes quest 


Plant researchers map diversity of Arabidopsis thaliana. 


BY HEIDI LEDFORD 


his father told him about the 1000 Genomes 

Project, which aims to sequence and com- 
pare the genomes of 1,000 people. Ecker, a 
molecular geneticist, explained that he and his 
colleagues were launching a similar project for 
the plant Arabidopsis thaliana. “My son said, 
“Well then you should sequence 1,0013” Ecker 
recalls. “He's a very competitive kid” 

And so the Arabidopsis ‘1001 Genome Project’ 
was born. More than four years later, a loose 
confederation of laboratories is on the verge of 
making that challenge a reality. Papers published 
online in Nature’ and Nature Genetics’ this week 
report the sequencing of nearly 100 A. thaliana 
genomes, the first swathe released by the pro- 
ject; around 400 more have been sequenced, 
but are not yet ready for publication. Last week, 
Ecker’s group at the Salk Institute in La Jolla, 
California, won a US$2-million grant from the 
National Science Foundation (NSF) to polish off 
another 500 strains, and to catalogue expressed. 
RNAs and map DNA methylation, a chemical 
modification that affects gene expression. 

Arabidopsis thaliana, or thale cress, is a small 
weed with a simple genome that stands in as a 
genetic reference for plants with more complex 
genomes. The genome project aims to uncover 
genetic changes that enable plants to adapt to 
their local environments. There are thousands 
of strains of A. thaliana in stocks worldwide, 
each of which might carry unique traits that 
helped it to thrive in its natural environment 
— tolerance for drought, perhaps, or defences 
against viral pathogens. “If you learn which 
genes are important for these traits, you could 


Jost Ecker’s teenage son listened intently as 
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breed them into crops — to allow them to move 
into a new environment or continue to succeed 
where they face climate change,’ Ecker says. 

The mining of natural variation for genetic 
information has gained momentum as faster 
DNA sequencing has delivered multiple 
genomes from wild populations. Similar pro- 
jects are under way in mice, fruitflies, rice 
and, of course, humans. “If you go into nature, 
you find all these fascinating mutations that 
have survived the sieve of natural selection,’ 
says geneticist Trudy Mackay of North Caro- 
lina State University in Raleigh, who leads the 
work in fruitflies. “But in the past we've been 
hampered in our ability to tease them apart” 

The 1001 Genome Project has had some 
problems, however. Unable to get funding for a 
single project, participating labs went their own 
ways, getting grants from a variety of sources, 
says Detlef Weigel, a plant biologist at the Max 
Planck Institute for Developmental Biology in 
Tubingen, Germany, who has spearheaded the 
project. The result was a fragmented effort, with 
each group sequencing strains and using tech- 
niques that best fitted its own research. 

And Ecker frets that this ad hoc coalition 
wont even have a central place to deposit 
and organize its data. Arabidopsis research- 
ers have relied on The Arabidopsis Informa- 
tion Resource (TAIR), but NSF funding for 
that project is being phased out and its fate is 
unclear. “We don’t want to have these data scat- 
tered all over the place,’ says Ecker, “but there 
may be nowhere to put them.” a 


1. Gan, X. et al. Nature http://dx.doi.org/10.1038/ 
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> mix, and industry minister Banri Kaieda 
announced for the first time this week that 
he expects some reactors to restart this year. 

That would be a step in the wrong direc- 
tion, says Tetsunari lida, director of the 
Tokyo-based Institute for Sustainable Energy 
Policies, who wants the country to seize the 
opportunity to invest in renewable energy. 

On 12 September, lida will launch the 
Japan Renewable Energy Foundation, which 
is backed by ¥1 billion (US$13 million) 
from Japan's richest man, telecoms mogul 
Masayoshi Son. The foundation will bring 
together some 100 experts from around the 
world to analyse obstacles to implementing 
renewable energy, and offer policy recom- 
mendations to the new government. 

The foundation’s cause did get a boost 
from Kan, who had agreed to step down 
only if Japan’s parliament passed a bill to 
support clean energy. The bill, which passed 
on 26 August, will next year guarantee a 
minimum price for wind, solar and other 
renewable energies that will make them 
more attractive for suppliers to invest in. 

lida is confident that this will help to 
sustain the shift away from conventional 
power sources. He notes that, from 2008 to 
2010, Japan’s annual increase in solar-power 
capacity jumped from 230 megawatts to 
almost 1 gigawatt, giving the country a total 
capacity of 3.6 gigawatts. He believes that 
the renewables bill will make huge solar 
farms economically attractive for the first 
time. “I think next year we'll see a tenfold 
jump in solar and fivefold in wind, he pre- 
dicts. Many say that this is unrealistic. “It’s 
impossible within two years, and too ambi- 
tious within ten,” says Toichi. 

A bill on global warming could also boost 
renewables by preventing Japan from filling 

its energy gap with 


“Thereisnowa fossil fuels. But the 
broad consensus _ bill’ prospects are 
that we need dim, even though 
to reduce our its aim — a legally 
reliance on binding target 


to reduce green- 
house-gas emis- 
sions to 25% below 1990 levels — was once 
a cornerstone of the Democratic Party’s 
manifesto. “Because of the reduction in our 
reliance on nuclear power, it will be very 
difficult” to hit that target, says Toichi. 

Japan’s parliament has so far declined to 
consider the bill, and political support for it 
is waning rapidly. “The new majority posi- 
tion [within the Democratic Party] is to get 
rid of the 25% target,’ says lida. 

Yet despite the uncertainties that now 
cloud Japan’s future, Kan’s administration 
did at least deliver one clear achievement 
on energy policy, says lida. “There is now 
a broad consensus that we need to reduce 
our reliance on nuclear power. That is the 
atmosphere now.’ = 
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Variation among Arabidopsis strains can reflect genetic adaptation to their local habitat. 


Halfway point for 
1,001 genomes quest 


Plant researchers map diversity of Arabidopsis thaliana. 


BY HEIDI LEDFORD 


his father told him about the 1000 Genomes 

Project, which aims to sequence and com- 
pare the genomes of 1,000 people. Ecker, a 
molecular geneticist, explained that he and his 
colleagues were launching a similar project for 
the plant Arabidopsis thaliana. “My son said, 
“Well then you should sequence 1,0013” Ecker 
recalls. “He's a very competitive kid” 

And so the Arabidopsis ‘1001 Genome Project’ 
was born. More than four years later, a loose 
confederation of laboratories is on the verge of 
making that challenge a reality. Papers published 
online in Nature’ and Nature Genetics’ this week 
report the sequencing of nearly 100 A. thaliana 
genomes, the first swathe released by the pro- 
ject; around 400 more have been sequenced, 
but are not yet ready for publication. Last week, 
Ecker’s group at the Salk Institute in La Jolla, 
California, won a US$2-million grant from the 
National Science Foundation (NSF) to polish off 
another 500 strains, and to catalogue expressed. 
RNAs and map DNA methylation, a chemical 
modification that affects gene expression. 

Arabidopsis thaliana, or thale cress, is a small 
weed with a simple genome that stands in as a 
genetic reference for plants with more complex 
genomes. The genome project aims to uncover 
genetic changes that enable plants to adapt to 
their local environments. There are thousands 
of strains of A. thaliana in stocks worldwide, 
each of which might carry unique traits that 
helped it to thrive in its natural environment 
— tolerance for drought, perhaps, or defences 
against viral pathogens. “If you learn which 
genes are important for these traits, you could 
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lina State University in Raleigh, who leads the 
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The 1001 Genome Project has had some 
problems, however. Unable to get funding for a 
single project, participating labs went their own 
ways, getting grants from a variety of sources, 
says Detlef Weigel, a plant biologist at the Max 
Planck Institute for Developmental Biology in 
Tubingen, Germany, who has spearheaded the 
project. The result was a fragmented effort, with 
each group sequencing strains and using tech- 
niques that best fitted its own research. 

And Ecker frets that this ad hoc coalition 
wont even have a central place to deposit 
and organize its data. Arabidopsis research- 
ers have relied on The Arabidopsis Informa- 
tion Resource (TAIR), but NSF funding for 
that project is being phased out and its fate is 
unclear. “We don’t want to have these data scat- 
tered all over the place,’ says Ecker, “but there 
may be nowhere to put them.” a 
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Isotope factory accelerates 


US nuclear scientists find silver lining in economic downturn — lower construction costs. 


BY EUGENIE SAMUEL REICH 


iven the tough economic times, good 
ex was the last thing that nuclear 

scientists expected at an 18-20 August 
meeting for potential users of the Facility for 
Rare Isotope Beams (FRIB), a planned national 
facility for nuclear physicists, which will be run 
under the auspices of the US Department of 
Energy (DOE). Instead, they heard that the 
downturn may have an upside: construction of 
the US$614.5-million facility, originally slated 
to begin in 2013, could be brought forward a 
year. Cash-strapped Michigan State Univer- 
sity in East Lansing, where FRIB is based, has 
decided to pump in money while construction 
costs remain low. 

“Tm delighted,” says Ani Aprahamian, a 
physicist at the University of Notre Dame in 
Indiana who hopes to use FRIB to study short- 
lived isotopes that are key to the production of 
heavy elements in stars. “Ihave students whose 
futures depend on these studies, and it would 
be great to be able to do them sooner.” 

FRIB, expected to serve around 800 users 
a year, will accelerate ionized atoms down 
a 500-metre-long series of tunnels folded 
around like a paper clip and then shatter them 
against a graphite target to produce beams of 
rare isotopes at higher intensity than at any 


PROBING INSTABILITY 


other facility in the world. The fragments 
could include thousands of isotopes that are 
predicted but have never been seen on Earth 
(see ‘Probing instability’). 

US nuclear scientists have dreamt of such 
a facility since the late 1980s, hoping that 
studying the lifetimes, masses, excited states 
and structure of rare isotopes might shed light 
on fundamental questions in nuclear physics 
and astrophysics. A report from the National 
Academy of Sciences endorsed the idea in 
2007, and in 2009 the DOE's Office of Science 
signed a cooperative agreement with Michi- 
gan State University to build the accelerator by 
2020, with a $520-million commitment from 
the DOE and another $94.5 million from the 
university. 

The proposal to accelerate construction and 
be ready to begin doing science as early as 2018 
is largely the brainchild of physicist and FRIB's 
project manager Thomas Glasmacher, who 
says that while thinking of ways to protect the 
project he came up with the idea of locking in 
construction costs at low prices. “It’s a really 
good time to build things,’ says Glasmacher. 
“People are out of work and that’s driving 
prices down.” Calculating that once the US 
economy rebounded, construction costs would 
rise fast, he pitched the case to the university. 
In February the university president agreed, 


The planned Facility for Rare Isotope Beams (FRIB) will generate isotopes that have predicted but previously 
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even though a Michigan state budget proposed 
the same month would cut $69 million in state 
support for the university. Glasmacher notes 
that he didn't ask for any extra money. The uni- 
versity will simply allocate $15 million of the 
promised budget a year earlier than planned. 
The next challenge for Glasmacher will be per- 
suading the DOE to agree to the accelerated 
schedule, which he will propose at a peer- 
review meeting later this month. 

Moving FRIB ahead would help remedy 
the loss of the Holifield Radioactive Ion Beam 
Facility at Oak Ridge National Laboratory in 
Tennessee, which the 
DOE has announced 
it will stop fund- 
ing (see Nature 471, 
278; 2011), a move 
to save $10.3 million 
per year. “The Holifield community supports 
FRIB,’ says Witold Nazarewicz, scientific 
director of the Holifield facility. Aprahamian 
adds that speeding up FRIB will also enable US 
nuclear science to compete sooner with facili- 
ties already running or planned overseas. The 
higher intensity of FRIB’s beams should enable 
experimentalists to get reliable statistics on 
hundreds of isotopes at once, compared with 
a handful at other facilities. 

FRIB is not the only science facility to have 
leveraged low construction costs. “We were 
fortunate to have negotiated our major con- 
tracts at a time when the construction indus- 
try was hit hard by the economic downturn,” 
says Steve Dierker, director of the $912-mil- 
lion National Synchrotron Light Source II at 
Brookhaven National Laboratory in Upton, 
New York, a future source of high-energy 
X-rays that is around 60% complete. 

However, FRIB’s director, Konrad Gelbke, 
cautions that the whole project still depends on 
the availability of funding through Congress, 
and on clearing multiple hurdles at the DOE. A 
statement from the Office of Science says that 
a review is scheduled for the spring of 2012 
to assess FRIB’s readiness to start construc- 
tion, and that if the review is favourable, the 
DOE will consider approving construction. 
“DOE appreciates the continuing willingness 
of Michigan State University to be flexible in 
apportioning its part of the cost-share,” the 
statement says. 

Glasmacher is confident that his approach is 
the right one and that the DOE reviewers will 
see that. “We're presenting it as an opportu- 
nity, because the quicker you do a project the 
cheaper it is,” he says. m 


“The quicker you 
do a project, the 
cheaper it is.” 
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Twins to probe 
Moons heart 


NASA mission will survey lunar gravity to map the dense 


rock beneath the surface. 


BY ERIC HAND 


he Moon’ face is an open book, but its 
| deeper nature is still a mystery. Probes 
from Europe, Japan, India, China and 
the United States have imaged the lunar surface 
in exquisite detail, mapped its minerals, looked 
for evidence of water and scouted for poten- 
tial landing sites. Now Maria Zuber, principal 
investigator for NASA’s Gravity Recovery and 
Interior Laboratory (GRAIL) mission, which 
is set to launch on 8 September, wants to reveal 
the Moon's hidden history. 

“I think we're going to find something quite 
surprising,’ says Zuber, a geophysicist at the 
Massachusetts Institute of Technology in Cam- 
bridge. Her confidence stems from the maps 
of lunar gravity that GRAIL is set to provide 
— orders of magnitude better than any before. 
The density variations they aim to reveal in 
the subsurface rock should shed light on the 
Moon's tumultuous and geologically active 
past, help to determine whether it has a liquid 
core and yield clues to the underlying structure 
of its giant impact basins, the lunar ‘maria. 

The GRAIL mission consists of twin space- 
craft that are near replicas of the Gravity 
Recovery and Climate Experiment (GRACE), 


a pair of satellites that have orbited Earth since 
2002, mapping the planet's gravity field so finely 
that they could see shifts in groundwater aqui- 
fers and ocean currents. Adapting that flight- 
tested technology helped to keep the cost of 
GRAIL to US$496 million. Another saving 
came in mission design: instead of blasting 
straight to the Moon, the spacecraft will ease 
into a lunar polar orbit after a three-and-a-half 
month journey, and only a modest amount of 
fuel will be needed for slowing down. 

Apart from four cameras on each spacecraft, 
which will capture images for public outreach, 
GRAIL carries only one instrument — and 
it isn’t even pointed at the Moon. As the two 
spacecraft coast 55 kilometres above the lunar 
surface, and 60-225 kilometres apart, a high- 
frequency radio link will measure the exact dis- 
tance between them. As one probe approaches a 
high-density object — for example, a mountain 
— it will feel a slightly stronger gravitational 
force and will momentarily speed up, changing 
the separation from its companion. For GRAIL 
to deliver on its promise, 
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will require taking into account the gravitational 
influence of distant planets, the movement of 
tectonic plates under tracking stations on Earth 
and even the pressure of sunlight on the space- 
crafts’ solar panels, says Zuber. 

The Moons gravity has already been mapped 
less precisely, by measuring, from Earth, the 
shifts in the speed of a single lunar orbiter — 
but this is impossible when the orbiter goes 
behind the Moon. SELENE (Kaguya), a Japa- 
nese mission launched in 2007, mapped gravity 
on the far side with the help of a relay satellite 
that orbited at a higher altitude, within radio- 
sight of both Earth and SELENE. But Zuber 
says that GRAIL's maps will be far more precise 
— even better than GRACE’s maps of Earth, 
because the effects of Earth's atmospheric drag 
mean that the GRACE spacecraft have to orbit 
at altitudes ten times higher than GRAIL will. 

One science target will be the Moon's deep 
structure. By bouncing lasers off reflectors 
left by the Apollo astronauts, researchers have 
picked up hints of a subtle wobbling, suggest- 
ing the presence of a soft core. Zuber says 
GRAIL should be able to confirm those hints, 
and might also find surprising additions to the 
core — for instance, by discerning whether 
compounds such as titanium oxide crystal- 
lized and sank into the core when the Moon 
was initially a ball of magma. 

The overall findings could illuminate how 
planets in the inner Solar System cooled into 
layered structures. “It transcends just knowing 
about the Moon,” says geologist Brad Jolliff of 
Washington University in St Louis, Missouri. 
“Tt helps us to understand how other rocky 
bodies differentiate.” By probing the rocks 
around impact basins, Zuber says, GRAIL 
should also help modellers to understand the 
dynamics of giant impacts. 

NASA’ plans for exploring the Moon after 
GRAIUs 90-day mission are uncertain, says 
Chip Shearer, a planetary geologist at the Uni- 
versity of New Mexico in Albuquerque and 
chair of NASAs Lunar Exploration Analysis 
Group. A wave of missions designed to sup- 
port NASA’s now-defunct Constellation pro- 
gramme, which envisioned returning humans 
to the Moon by 2020, is tapering off (see ‘Moon 
ruslv). The last of these, the Lunar Atmosphere 
and Dust Environment Explorer (LADEE), a 
mission to measure the effects of the Moon's 
fine dust, is due to launch in 2013. But two 
lunar science priorities aren't being addressed. 
A proposed mission, led by Jolliff, to return 
samples from the Moon’s largest impact basin, 
near the south pole, lost out to a mission to 
return an asteroid sample. And advocates 
for the International Lunar Network, once a 
NASA-led endeavour to set up seismic detec- 
tors on the Moon’s surface, have been told that 
they will have to compete for NASA’s support 
with everybody else's Solar System proposals. 

“We know what, in terms of lunar science, 
should come next,’ says Shearer. “But we don't 
know what will come next? = 
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The wealth of lunar missions in the 
past two decades is not set to last. 


NASA's Clementine hints at existence 
of water at lunar south pole (pictured). 


NASA's Lunar Prospector allows 
gravity map of the Moon’s near side. 


United States unveils ‘vision’ to return 
astronauts to the Moon. 


SMART-1, Europe’s first lunar orbiter, 
launched. 


Japan’s SELENE (Kaguya) produces 
gravity map of the far side. 


Launch of Chang’e, China’s first lunar 
orbiter. 


Google supports US$30-million Lunar 
X Prize for first private team to put a 
rover on the Moon. 


India’s Chandrayaan-1 finds evidence 
of watery minerals. 


NASA's Lunar Reconnaissance Orbiter 
finds water in shadowed polar craters. 


Chang’e 2 launches. 


NASA human space policy revised to 
target asteroids before the Moon. 


RAIL due to launch. 


NASA's Lunar Atmosphere and Dust 
Environment Explorer (pictured, 
below) is scheduled to launch. 
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Red tape puts chill on 


Siberian research 


High-profile carbon project to proceed, but with a proviso. 


BY QUIRIN SCHIERMEIER 


ne of Russia’s most prominent inter- 
C ) tons science projects has fallen 

foul of cold-war-era concerns. An 
expedition to study carbon transport around 
Siberia’s Yenisey River has been postponed for 
a year after officials blocked the use of sam- 
pling equipment and put some sites off-limits. 
The episode highlights the tension between 
Russias bureaucracy and its growing ambition 
to develop wider research collaborations — 
part ofa strategy to revitalize domestic science. 

In July, Ernst-Detlef Schulze, a carbon-cycle 
researcher and founding director of the Max 
Planck Institute for Biogeochemistry in Jena, 
Germany, and 35 scientists from Germany, the 
Netherlands, France and Russia, had their bags 
packed for the journey to Siberia. 

Then they received a letter from the Rus- 
sian Federal Security Service prohibiting them 
from using any Western equipment on the trip. 
Schulze, who last year received a 150-million- 
rouble (US$5-million) grant from the Rus- 
sian government to do research in Siberia, 
was aghast. “When I first saw the letter I just 
couldn't believe what I read,” he says. “I was so 
disappointed and furious I went to my own for- 
est and lumbered trees until I was exhausted.” 

Western scientists participating in the 
12-billion-rouble ‘mega-grant’ programme 
that funds Schulze (see Nature 473, 428-429; 
2011) have previously complained about exces- 
sive bureaucracy and frequent problems with 
export control and customs, and have received 
support from the Russian science minister, 
Andrei Fursenko (see Nature 465, 858; 2010). 

So Schulze and his co-workers at the Sibe- 
rian Federal University in Krasnoyarsk used 
diplomatic channels to try to reverse the ruling. 
Equipment worth €600,000 (US$865,000) was 
already in Siberia, ready to be used to calculate 
the carbon budget of the Yenisey’s huge catch- 
ment area (see map). Siberia’s soils and forests 
serve as one of Earth's largest carbon sinks, but 
the carbon flux between ecosystems there has 
never been studied in detail. 

Ata hastily arranged meeting in Moscow on 
4 August, Fursenko said that he regretted the 
situation and promised to try to finda solution. 
A week later, the expedition was given clearance 
to begin on 1 September — but with restrictions 
that Schulze found unacceptable. Sampling 
tools, for instance, could be handled only by 


SIBERIAN SAMPLING 


Researchers will study the flow of. carbon between 
land, water and air across;arious Climate zones in 
Siberia. 
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RUSSIA 


Russian staff — most of whom are not trained 


to use the geochemical analysis equipment, 
says Schulze. The Russian Federal Services for 
Technology and Export Control also refused to 
approve the purchase of equipment in the West, 
such as a freeze-dryer to preserve samples. 

“This is not the way to do science,’ says Han 
Dolman, an environmental scientist at the VU 
University Amsterdam. Dolman had planned to 
study how carbon is transported from soils into 
the Yenisey River, and how much of the organic 
carbon dissolved in the river becomes carbon 
dioxide. Unlike on land, carbon exchange in 
aquatic systems is poorly understood. 

On 12 August, Schulze threatened to cancel 
the expedition, but further negotiations yielded 
a compromise: the trip will take place next year, 
with the only restriction being a ban on taking 
samples from a roughly 20,000-square-kilo- 
metre area around Krasnoyarsk, where Russia 
operates a nuclear-reprocessing facility. 

The saga shows that scientists do not yet have 
enough freedom in their research in Russia, says 
Schulze. “There's an absurdly opaque and often 
arbitrary bureaucracy at work. Thankfully, 
Fursenko was always clearly on our side.” = 
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Siberia’s Yenisey River has been postponed for 
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pling equipment and put some sites off-limits. 
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Russias bureaucracy and its growing ambition 
to develop wider research collaborations — 
part ofa strategy to revitalize domestic science. 
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researcher and founding director of the Max 
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Netherlands, France and Russia, had their bags 
packed for the journey to Siberia. 

Then they received a letter from the Rus- 
sian Federal Security Service prohibiting them 
from using any Western equipment on the trip. 
Schulze, who last year received a 150-million- 
rouble (US$5-million) grant from the Rus- 
sian government to do research in Siberia, 
was aghast. “When I first saw the letter I just 
couldn't believe what I read,” he says. “I was so 
disappointed and furious I went to my own for- 
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Western scientists participating in the 
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sive bureaucracy and frequent problems with 
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the carbon flux between ecosystems there has 
never been studied in detail. 

Ata hastily arranged meeting in Moscow on 
4 August, Fursenko said that he regretted the 
situation and promised to try to finda solution. 
A week later, the expedition was given clearance 
to begin on 1 September — but with restrictions 
that Schulze found unacceptable. Sampling 
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to use the geochemical analysis equipment, 
says Schulze. The Russian Federal Services for 
Technology and Export Control also refused to 
approve the purchase of equipment in the West, 
such as a freeze-dryer to preserve samples. 

“This is not the way to do science,’ says Han 
Dolman, an environmental scientist at the VU 
University Amsterdam. Dolman had planned to 
study how carbon is transported from soils into 
the Yenisey River, and how much of the organic 
carbon dissolved in the river becomes carbon 
dioxide. Unlike on land, carbon exchange in 
aquatic systems is poorly understood. 

On 12 August, Schulze threatened to cancel 
the expedition, but further negotiations yielded 
a compromise: the trip will take place next year, 
with the only restriction being a ban on taking 
samples from a roughly 20,000-square-kilo- 
metre area around Krasnoyarsk, where Russia 
operates a nuclear-reprocessing facility. 

The saga shows that scientists do not yet have 
enough freedom in their research in Russia, says 
Schulze. “There's an absurdly opaque and often 
arbitrary bureaucracy at work. Thankfully, 
Fursenko was always clearly on our side.” = 
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Scientists promised ‘one 
voice’ in European policy 


ScienceEurope hopes to shift balance of power away from Brussels and towards researchers. 


BY NATASHA GILBERT 


yeviions often struggle to get their 


opinions heard above the din of voices 

competing to influence policy and 
research-funding decisions in the Euro- 
pean Union. A new Brussels-based group, 
ScienceEurope, is now positioning itself as 
the scientists’ champion in the fight to sway 
decision-makers. “We will become the single 
voice for science in Europe,’ says Paul Boyle, a 
member of the pilot board of the organization, 
which launches next month. 

“If we speak in one voice it will be easier to 
see if our recommendations have influenced 
policy,” adds Boyle, who is chief executive 
of Britain’s Economic and Social Research 
Council in Swindon. 

ScienceEurope unites two science advocacy 
groups: the European Science Foundation 
(ESF) based in Strasbourg, France, and the 
European Heads of Research Councils (EURO- 
HORCs) based in Berne. The two groups have 
common members, including Europe’ leading 
national research and funding organizations, 
such as the Helmholtz Association of German 
Research Centres, headquartered in Berlin, and 
Britain’s Medical Research Council in London. 
They also share similar goals and have previ- 
ously worked together on policy development. 
Earlier this year they voted to join forces. 

Once ScienceEurope holds its founding 
assembly in Berlin on 21 October, EURO- 
HORCs will cease to exist. The ESF may con- 
tinue as a separate body but will probably wind 
down its activities over the next few years. 

Marja Makarow, a molecular biologist at the 
University of Helsinki and chief executive of 
the ESE, says that the merger was born out of a 
need for an organization that would have greater 
influence in Brussels and would be “flexible 
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WHO FUNDS EUROPE’S SCIENCE? 


Despite having a major influence on research 
agendas, the European Commission (EC) allocates 
relatively little of Europe’s science funding. 


European Commission 


Other 3% 
18% | 
<=) 

Germany 
Netherlands 27% 


$308.2 bn 


Annual spending 2009. EC funding averaged from FP7 budget. 


enough to respond quickly to emerging issues’, 
features that Makarow feels neither the ESF 
nor EUROHORGCs has. Being located outside 
Brussels and holding infrequent meetings have 
proved disadvantageous to both organizations. 
And, unlike ScienceEurope, the ESF had no 
mandate to speak on behalf of its members. 

ScienceEurope’ pilot board will flesh out the 
organization's structure and strategy over the 
coming months. Boyle says that they plan to set 
up committees covering all research disciplines 
to guide the organization’ activities. They also 
hope to hold large annual meetings to bring 
together the plethora of other European and 
international science and university groups to 
discuss strategies and debate priorities. 

One of ScienceEurope’s key goals is to help 
to build the European Research Area (ERA), 
a long-cherished ideal within the European 
Union that would allow researchers to move 


freely across borders, taking their funding with 3 
them. ScienceEurope’s membership base gives 
it strong links with research policy-makers at 
national levels, which could help it tackle the 
thornier problems of the ERA, such as trans- 
ferring scientists’ pensions from one country 
to another. 

Makarow also hopes that ScienceEurope can 
shift the balance of power in science policy- 
making away from the European Commission 
and back towards scientists in member states. 
She points out that the European Commis- 
sion manages a tiny fraction of the funds spent 
annually across Europe (see graph), yet leads 
the debate on the direction of science policy. 
“The balance is not right,” she says. 

Ernst Rietschel, former president of the 
Leibniz Association of German research 
institutes, agrees that European scientists need 
better-coordinated representation. But he and 
others are concerned that ScienceEurope will 
not be influential enough because, unlike the 
ESE, it will not disburse research funding. To 
have clout you “need money’, he observes. 

Jean-Pierre Henriet, a geologist and emeri- 
tus professor at the University of Ghent in 
Belgium, is angry that the merger will result 
in the ESF ending its funding of collaborative 
research projects. Its annual budget previously 
provided more than €100 million (US$144 mil- 
lion), collected from member states. 

The ESF grants provide “essential funding” 
for young scientists, who develop contacts 
and learn networking and leadership skills in 
collaborative projects, he says, and losing the 
grants will leave a “major gap” in the research- 
funding landscape. 

“T understand their concerns,” says 
Makarow, “and hope that other instruments 
that provide funding at a European level will 
be developed to fill the gap”. m 


SOURCE: OECD/ 
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DOES SCIENCE COME OUT Tre WINNER? 


THE 24/7 LAB 


BY HEIDI LEDFORD 


t’s just about midnight on a hot Friday night in July, 
Enrique Iglesias’ ‘Dirty Dancer’ is on the radio, and 
26-year-old graduate student Sagar Shah is starting to 
look winded. The problem, he says, is not how late it is, 
or even that he has spent the past three hours working in 
a cramped sterile cell-culture hood. The problem is that the routine 
cell-culture maintenance he is doing, bathing his collection of rare 
human tumour cells with fresh medium, produces no data. Anda lack 
of data, says Sagar, makes him “hungry” for it. 

Next to Sagar, Lyonell Kone, a 22-year-old student, rises from 
another sterile hood and heads for the microscope, jostling his lab- 
mate Nathaniel Tippens out of the way. He squints at his cultures, 
checking to make sure the cells are growing at the right density. Satis- 
fied, he backs away, gingerly places his flasks in an incubator, rubs his 
eyes and stretches. He’s finished for the night. 

The weary waltz within this cramped cell-culture room is the only 
flicker of activity at this hour in the Koch Cancer Research Build- 
ing at Johns Hopkins University in Baltimore, Maryland. It’s the Fri- 
day before the 4 July holiday, and even the night cleaners quit hours 
ago, leaving behind the faint smell of disinfectant and the occasional 
haunting beep of an autoclave echoing down silent hallways. But these 
members of neurosurgeon Alfredo Quifones-Hinojosa’s laboratory 
are accustomed to being the last out of the building. In alab where the 
boss calls you at 6 a.m., schedules Friday evening lab meetings that can 
stretch past 10 p.m., and routinely expects you to work over Christmas, 
sticking it out until midnight on a holiday weekend is nothing unusual. 

Many labs are renowned for their intense work ethic and long 
hours. When I set out to profile such a laboratory, I wanted to find out 
who is drawn to these environments, what it is really like to work there 
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and whether long hours lead to more or better science. I approached 
eleven laboratories with reputations for being extremely hard-work- 
ing. Ten principal investigators turned me down, some expressing a 
fear of being seen as ‘slave-drivers. 

Number eleven — Quifiones-Hinojosa — had no such qualms. His 
work ethic is no secret: a 2007 essay in the New England Journal of 
Medicine’ and several television and newspaper reports have traced his 
path from 19-year-old illegal immigrant from Mexico, labouring in the 
fields of California, to neurosurgeon at one of the United States leading 
research hospitals. He did not get there by working 9 to 5. 

Quifiones-Hinojosa fondly recalls the long nights he worked alone 
in the laboratory as an undergraduate at the University of California, 
Berkeley, and again as a medical student at Harvard University in 
Cambridge, Massachusetts. When he was a resident at the University 
of California, San Francisco, his three young children thought he lived 
in the hospital. In effect he did, putting in 140 hours a week and grab- 
bing 10-minute naps when he could. Quifiones-Hinojosa credits his 
professional rise to his resilience and a seemingly limitless capacity for 
hard work. “When you go that extra step, you are training your brain 
like an athlete,” he says. And the fact that his group has published 
113 articles in the past six years and holds 13 funding grants is not, 
he says, because he is brighter or better connected than colleagues. 
“Tt’s just a matter of volume,’ he says. “The key is we submit a couple 
of dozen grant applications a year, and we learn from our mistakes.” 

And so, at ease with hard work and the media and steeped in the 
long-hours culture of medicine, Quifiones- Hinojosa eagerly wel- 
comed me into his research laboratory. “I would be delighted,” he said. 

The morning I arrived — at 8 a.m. sharp — Quifiones-Hinojosa 
insisted that I observe his first surgery of the day. He and his resident, 
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Alfredo Quifiones-Hinojosa (centre, with surgery team) selects lab members who expect long hours and motivates them by inviting patients to lab meetings. 


Shaan Raza, were removing a pituitary tumour from a 54-year- 
old woman. The operating room is an extension of his laboratory, 
Quifiones-Hinojosa explained: it is there that he collects the tissue 
samples that his staff — with the patients’ consent — will immortalize 
in cell culture. They are the grist for the lab’s studies of how cancer 
stem cells fuel brain-tumour development and how tumour cells 
spread through the brain. 

Quifiones-Hinojosa had gone to bed at 1 a.m. the night before, 
and was up again at 5. Walking to surgery, he passes by Kone, who is 
climbing the stairs to the lab. “You ready to rock ‘Y roll?” Quifones- 
Hinojosa asks reflexively as he walks by. Then he glances at the time 
and a mischievous smile darts across his face. “Hey, it’s 10 a.m.,.” he 
calls over his shoulder, never breaking stride. >What are you doing 
coming in at 10 a.m.?” 

As we walk out of the building Quifiones- Hinojosa nudges me with 
his elbow and laughs: “See, now he’s going to go back to the lab and tell 
everyone, ‘Dr. Q caught me coming in at 10 a.m.!” (Lab members, who 
in fact mostly arrive after 9 a.m., confirmed that Kone did exactly that.) 

Quifiones- Hinojosa is gregarious and charming, with an infectious 
energy and a habit of advertising his humility. But he also knows how 
intimidating he can be to the people who work for him, and he’ not 
afraid to capitalize on that. In 2007, just two years after he started at 
Hopkins, he rounded a corner in the cafeteria and saw his lab mem- 
bers sitting at a table, talking and laughing. When they caught sight of 
him, he says, they stopped, stood up, and went straight back to the lab. 

Quifiones-Hinojosa has another way to keep his lab motivated. 
Every so often, he asks a cancer patient or his or her family to join the 
lab meeting. It is a chance for the patients to learn about the research 
being done with their tumours. And for the lab, it is a reminder of the 


urgency of their work. Quifiones-Hinojosa draws out each patient’s 
personal story: how they found out they had cancer, how they felt 
when they got the news and how it has impacted on their family. 
Being confronted with all this can be a shock for researchers without 
medical training. “You can see it in their faces” says Hugo Guerrero- 
Cazares, a research associate in the lab. “When someone says ‘I’m 
going to die in six months; it really hits them.” 

Back in the operating room, nurses and surgeons buzz about 
setting up equipment around the unconscious patient. Pituitary 
tumours can nestle between the two carotid arteries that supply the 
brain with blood, making the growths exquisitely difficult to remove. 
(Quifiones-Hinojosa says he woke up last night worrying about the 
operation and spent two hours practising every move of the surgery in 
his mind before nodding offagain.) Normally the tumours are about 
the size of a pea; this one is closer to a golf ball. Quifiones-Hinojosa 
and Raza meticulously scoop out the tumour piece by piece. 

The surgery seems to be a success. Quifiones-Hinojosa steps back 
from the patient and makes sure the sample is labelled and stored 
appropriately on ice. He checks with the pathologists down the hall 
to make sure it includes the tumour tissue he wants, then sends it on 
to his lab: sample 872 in his collection. 


FAST FOOD 

In the laboratory, near lunchtime, endocrinology research fellow 
Nestoras Mathioudakis prepares the tissue in a sterile tissue-culture 
hood. While the cells are incubating with an enzyme that destroys 
contaminating red blood cells, he dashes out to eat a frozen meal. He 
practically lives on them, he says, but worries that the high salt content 
may be giving him searing headaches. One day, after eating about five 
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frozen dinners, he sat down at a microscope and found it difficult to 
focus his right eye. Still, the meals are cheap, fast and a way to grab 
food without leaving the lab, and Mathioudakis predicts that today 
will be a busy, multi-frozen-dinner day. 

He doesn’t really mind. “Only people with a certain type of per- 
sonality would stay in a lab like this,” says Guerrero-Cazares, who 
has worked there for four years. The night before I arrived, Quifiones- 
Hinojosa was checking his e-mail on his way home when he noticed 
a message from a medical student at Rosalind Franklin University of 
Medicine and Science in Chicago, Illinois, who wanted to work in the 
laboratory. Quifiones-Hinojosa receives several such enquiries a day, 
but something about this student — Joshua Bakhsheshian — caught 
his eye. He fired back a message: give me a number at which I can reach 
you at 6 a.m.. It was midnight. A minute later he had his reply. 

At 6:02 a.m. Quifiones- Hinojosa called Bakhsheshian. “T laid it on 
so thick for this guy,” Quifiones- Hinojosa crooned later that morning. 
“T said, ‘You've seen me on TV, you think I’m so nice. But you come 
into my lab, you're going to work. The people in my lab, they work 
24 hours a day. They’re here over Christmas and New Year writing 
grants, and you will be, too.” 

“That’s music to my ears,’ replied Bakhsheshian, who later told 
me he had never expected such a speedy reply from the busy sur- 
geon, and had stayed up much of the night frantically studying the 
lab’s publications. (Quifones- 
Hinojosa later offered him a 
spot in the lab if Bakhsheshian 
could get a fellowship.) 

Not everyone whom 
Quifiones- Hinojosa selects 
adapts well to the rigours of 
his laboratory. Research fellow 
David Chesler, a neurosurgery 
resident at the University of 
Maryland in College Park with 
a PhD in neuroimmunology and circles under his eyes, recalls a techni- 
cian who “wasn't keeping up” — and Guerrero-Cazares recounts the 
tale of a colleague who simply stopped coming to the Friday night lab 
meetings. Both left the lab. Quifiones-Hinojosa says that he asked them 
to leave “very nicely’, and helped them to find positions elsewhere. 

Still, Quifiones-Hinojosa’s technique of screening for work hab- 
its and personality traits may be one reason why the lab runs so 
smoothly, despite its intensity. Pierre Azoulay, associate professor of 
strategy at the Massachusetts Institute of Technology’s Sloan School 
of Management in Cambridge, says that asking an employee to work 
long hours can backfire if that person is used to operating differ- 
ently. “Unless you select your trainees very carefully on those criteria 
— which I wager most principal investigators don’t — there would 
presumably be deleterious effects.” 

Another key is autonomy. Many members of the Quifiones-Hino- 
josa lab develop their own projects, and write the grant applications to 
fund them. They express a proud sense of ownership when it comes to 
their work. And despite the 6 a.m. phone calls from the boss — made 
during his commute to the hospital — they say they feel reasonably 
free to set their own schedules. Shah says that 20-hour days are not 
uncommon for him. But “I don't believe in clocking in and clocking 
out,” he says. “I could do that at Walmart and get overtime” 

That freedom is essential to keeping researchers happy and 
productive, says Azoulay. “Science is a harsh mistress,” he says. “I 
think relatively few scientists are expecting 9-to-5 jobs. But they are 
expecting autonomy, and a principal investigator that violates that 
expectation could potentially run into problems” 

So far, Quifiones-Hinojosa’s lab seems relatively problem-free. But 
are the long hours and personal sacrifices worth it, for the lab members 
and for science? In 2004, Steven Stack, a sociologist at Wayne State 
University in Detroit, Michigan, published an analysis of survey data 
collected by the US National Research Council on 11,231 PhD scientists 
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"THE PEOPLE IN MY LAB, THEY WORK 

24 HOURS A DAY. THEY'RE HERE OVER 

CHRISTMAS AND NEW YEAR WRITING 
GRANT APPLICATIONS.” 


and engineers working in academia’. He found that the average scientist 
worked about 50 hours a week, and in general the more hours an indi- 
vidual put in, the more publications he or she cranked out. 

Quifiones-Hinojosa’s lab seems to fit that mould. Of the 113 articles 
he has published since he launched the lab in 2005, most are from a 
small ‘dry laboratory working on clinical outcomes in cancer. His 
27-person ‘wet lab has published 29. Overall, his h index — a measure 
of productivity that factors in the number of articles published and 
how often they are cited — is 27, compared with 10.7 for US neuro- 
surgeons at the same associate professor level’. Quifiones-Hinojosa 
also notes that it takes researchers in his department an average of 
15 years to be promoted to full professor. He was recommended for 
a full professorship this year, after just six. 

Biochemist Philip Cohen of the University of Dundee, UK, says 
that of the 70 postdocs and nearly 50 students he has supervised dur- 
ing his career, the most successful were those who put in long hours 
and worked efficiently. Cohen frets that the lab culture is changing. 
“Everyone's told not to stress themselves or overdo things, and I could 
not disagree more,’ he says. “I'm afraid they're losing all the fun in life 
if they don't really push themselves to the limit 

But not everyone agrees that more hours yield more results. Dean 
Simonton, a psychology researcher at the University of California, 
Davis, who has studied scientific creativity, says that the pressure 
for publications, grants and 
tenure may have created a 
single-minded, “monastic” 
culture in science. But some 
research suggests that highly 
creative scientists tend to have 
broader interests and more 
hobbies than their less creative 
colleagues, he says. Chemist 
Stephen Buchwald of the Mas- 
sachusetts Institute of Technol- 
ogy urges the members of his lab to take a month’s holiday every 
year, and not to think about work when they're gone. “The fact is, I 
want people to be able to think,” he says. “If they’re completely beaten 
down, they're not going to be very creative.” His approach does not 
seem to have hurt productivity: Thomson Reuters declared Buchwald 
one of the most highly cited chemists from 1999 to 2009, with an 
average of more than 86 citations for his 171 papers. 

An intense work schedule also comes with personal costs that can 
be hard to measure. “The area in which I have failed the most is as a 
father,’ Quifiones-Hinojosa readily admits. It is something he is trying 
to correct, by spending more time with his kids and shuttling them 
to swimming lessons (although phoning lab members on the way). 

And postdoc Pragathi Achanta looks wistful when she talks about 
her niece in India, who was six months old the last time Achanta 
saw her — now she’s nearly five. Achanta has been working on grant 
applications over the holidays, and hasn't had time to visit her family. 

Now, at 8 p.m. on Friday 1 July, Achanta is taking advantage of the 
unusually short lab meeting to prepare surgical tools for a mouse exper- 
iment to model the effects of radiotherapy on stem cells. She wants to 
be ready so that she can complete the surgeries quickly on Saturday 
morning before she leaves to help teach a course at Cold Spring Harbor 
Laboratory in New York. Later this year, grant schedule allowing, she 
hopes to travel to India to see her niece at last. But she admits to being 
nervous about broaching the subject with the boss. 

Quifiones-Hinojosa, though, says that he has nothing against 
holidays. “Vacations are great,” he says. “Take a weekend off? = 
SEE EDITORIAL P. 5 AND COMMENT P.27 


Heidi Ledford is a reporter for Nature in Cambridge, Massachusetts. 


1. Quifiones-Hinojosa, A. N. Engl. J. Med. 357, 529-531 (2007). 
2. Stack, S. Res. Higher Ed. 45, 891-920 (2004). 
3. Lee, J., Kraus, K. L. & Couldwell, W. T. J. Neurosurg. 111, 387-392 (2009). 


© 2011 Macmillan Publishers Limited. All rights reserved 


FEATURE BiXaiS 


Scientists think they 
bs can prove that 
free will is an illusion: 
Philosophers are 
urging them to 
think again. 


BY KERRI SMITH 


THE EXPERIMENT HELPED TO CHANGE JOHN-DYLAN HAYNES’S OUTLOOK ON LIFE. 
In 2007, Haynes, a neuroscientist at the Bernstein Center for Computational Neuroscience 
in Berlin, put people into a brain scanner in which a display screen flashed a succession of 
random letters’. He told them to press a button with either their right or left index fingers 
whenever they felt the urge, and to remember the letter that was showing on the screen when 
they made the decision. The experiment used functional magnetic resonance imaging (fMRI) 
to reveal brain activity in real time as the volunteers chose to use their right or left hands. The 
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results were quite a surprise. > 
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> “The first thought we had was ‘we have to 
check if this is real,” says Haynes. “We came up 
with more sanity checks than I've ever seen in 
any other study before.” 

The conscious decision to push the button 
was made about a second before the actual act, 
but the team discovered that a pattern of brain 
activity seemed to predict that decision by as 
many as seven seconds. Long before the sub- 
jects were even aware of making a choice, it 
seems, their brains had already decided. 

As humans, we like to think that our deci- 
sions are under our conscious control — that 
we have free will. Philosophers have debated 
that concept for centuries, and now Haynes 
and other experimental neuroscientists are 
raising a new challenge. They argue that 
consciousness of a decision may be a mere 
biochemical afterthought, with no influence 
whatsoever on a person’s actions. According 
to this logic, they say, free will is an illusion. 
“We feel we choose, but we don't; says Patrick 
Haggard, a neuroscientist at University Col- 
lege London. 

You may have thought you decided whether 
to have tea or coffee this morning, for exam- 
ple, but the decision may have been made long 
before you were aware of it. For Haynes, this 
is unsettling. “Tll be very honest, I find it very 
difficult to deal with this,” he says. “How can 
I calla will ‘mine’ if I don't even know when it 
occurred and what it has decided to do?” 


THOUGHT EXPERIMENTS 

Philosophers aren't convinced that brain scans 
can demolish free will so easily. Some have 
questioned the neuroscientists’ results and 
interpretations, arguing that the researchers 
have not quite grasped the concept that they 
say they are debunking. Many more don’t 
engage with scientists at all. “Neuroscientists 
and philosophers talk past each other,’ says 
Walter Glannon, a philosopher at the Univer- 
sity of Calgary in Canada, who has interests in 
neuroscience, ethics and free will. 

There are some signs that this is beginning 
to change. This month, a raft of projects will 
get under way as part of Big Questions in Free 
Will, a four-year, US$4.4-million programme 
funded by the John Templeton Foundation 
in West Conshohocken, Pennsylvania, which 
supports research bridging theology, philoso- 
phy and natural science. Some say that, with 
refined experiments, neuroscience could help 
researchers to identify the physical processes 
underlying conscious intention and to better 
understand the brain activity that precedes 
it. And if unconscious brain activity could be 
found to predict decisions perfectly, the work 
really could rattle the notion of free will. “It's 
possible that what are now correlations could 
at some point become causal connections 
between brain mechanisms and behaviours,” 
says Glannon. “If that were the case, then it 
would threaten free will, on any definition by 
any philosopher” 


Haynes wasn‘ the first neuroscientist to 
explore unconscious decision-making. In the 
1980s, Benjamin Libet, a neuropsychologist 
at the University of California, San Francisco, 
rigged up study participants to an electro- 
encephalogram (EEG) and asked them to 
watch a clock face with a dot sweeping around 
it’. When the participants felt the urge to move 
a finger, they had to note the dot’s position. Libet 
recorded brain activity several hundred milli- 


One uses more accurate scanning techniques’ 
to confirm the roles of the brain regions impli- 
cated in his previous work. In the other, which is 
yet to be published, Haynes and his team asked 
subjects to add or subtract two numbers from 
a series being presented on a screen. Decid- 
ing whether to add or subtract reflects a more 
complex intention than that of whether to push 
a button, and Haynes argues that it is a more 
realistic model for everyday decisions. Even in 


HOW CAN I CALL A WILL ‘MINE’ IF 


| DON’T EVEN KNOW WHEN IT OCCURRED 
AND WHAT IT HAS DECIDED TO DO? 


seconds before people expressed their conscious 
intention to move. 

Libet’s result was controversial. Critics said 
that the clock was distracting, and the report 
of a conscious decision was too subjective. 
Neuroscience experiments usually have con- 
trollable inputs — show someone a picture at 
a precise moment, and then look for reactions 
in the brain. When the input is the participant's 
conscious intention to move, however, they 
subjectively decide on its timing. Moreover, crit- 
ics werent convinced that the activity seen by 
Libet before a conscious decision was sufficient 
to cause the decision — it could just have been 
the brain gearing up to decide and then move. 

Haynes’s 2008 study’ modernized the ear- 
lier experiment: where Libet’s EEG technique 
could look at only a limited area of brain 
activity, Haynes’s {MRI set-up could sur- 
vey the whole brain; and where Libet’s par- 
ticipants decided simply on when to move, 
Haynes’s test forced them to decide between 
two alternatives. But critics still picked holes, 
pointing out that Haynes and his team could 
predict a left or right button press with only 
60% accuracy at best. Although better than 
chance, this isn’t enough to claim that you 
can see the brain making its mind up before 
conscious awareness, argues Adina Roskies, 
a neuroscientist and philosopher who works 
on free will at Dartmouth College in Hano- 
ver, New Hampshire. Besides, “all it suggests 
is that there are some physical factors that 
influence decision-making”, which shouldn't 
be surprising. Philosophers who know about 
the science, she adds, don't think this sort 
of study is good evidence for the absence of 
free will, because the experiments are carica- 
tures of decision-making. Even the seemingly 
simple decision of whether to have tea or cof- 
fee is more complex than deciding whether to 
push a button with one hand or the other. 

Haynes stands by his interpretation, and has 
replicated and refined his results in two studies. 
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this more abstract task, the researchers detected 
activity up to four seconds before the subjects 
were conscious of deciding, Haynes says. 

Some researchers have literally gone deeper 
into the brain. One of those is Itzhak Fried, a 
neuroscientist and surgeon at the University of 
California, Los Angeles, and the Tel Aviv Medi- 
cal Center in Israel. He studied individuals with 
electrodes implanted in their brains as part ofa 
surgical procedure to treat epilepsy”. Recording 
from single neurons in this way gives scientists 
a much more precise picture of brain activity 
than fMRI or EEG. Fried’s experiments showed 
that there was activity in individual neurons of 
particular brain areas about a second and a half 
before the subject made a conscious decision to 
press a button. With about 700 milliseconds to 
go, the researchers could predict the timing of 
that decision with more than 80% accuracy. “At 
some point, things that are predetermined are 
admitted into consciousness,’ says Fried. The 
conscious will might be added on to a decision 
at a later stage, he suggests. 


MATERIAL GAINS 

Philosophers question the assumptions 
underlying such interpretations. “Part of 
what’s driving some of these conclusions is 
the thought that free will has to be spiritual or 
involve souls or something,” says Al Mele, a 
philosopher at Florida State University in Tal- 
lahassee. If neuroscientists find unconscious 
neural activity that drives decision-making, 
the troublesome concept of mind as separate 
from body disappears, as does free will. This 
‘dualist’ conception of free will is an easy target 
for neuroscientists to knock down, says Glan- 
non. “Neatly dividing mind and brain makes it 
easier for neuroscientists 


to drivea wedge between NATURE.COM 
them, he adds. To listen to a podcast 
The trouble is, most about neuroscience 


and free will, visit: 
go.nature.com/ihlh5z 


current philosophers 
don’t think about free 


will like that, says Mele. Many are material- 
ists — believing that everything has a physical 
basis, and decisions and actions come from 
brain activity. So scientists are weighing in on 
a notion that philosophers consider irrelevant. 

Nowadays, says Mele, the majority of 
philosophers are comfortable with the idea 
that people can make rational decisions in a 
deterministic universe. They debate the inter- 
play between freedom and determinism — the 


says. Some informal meetings have already 
begun. Roskies, who is funded through the 
programme, plans to spend time this year in 
the lab of Michael Shadlen, a neurophysiologist 
at the University of Washington in Seattle who 
works on decision-making in the primate 
brain. “We're going to hammer on each other 
until we really understand the other person's 
point of view, and convince one or other of us 
that we're wrong,’ she says. 


THAT THERE ARE SOME PHYSICAL 


FACTORS THAT INFLUENCE DECISION- 
MAKING SHOULDN'T BE A SURPRISE. 


theory that everything is predestined, either by 
fate or by physical laws — but Roskies says that 
results from neuroscience can't yet settle that 
debate. They may speak to the predictability 
of actions, but not to the issue of determinism. 

Neuroscientists also sometimes have mis- 
conceptions about their own field, says Michael 
Gazzaniga, a neuroscientist at the University of 
California, Santa Barbara. In particular, scien- 
tists tend to see preparatory brain activity as 
proceeding stepwise, one bit at a time, to a final 
decision. He suggests that researchers should 
instead think of processes working in parallel, 
in a complex network with interactions hap- 
pening continually. The time at which one 
becomes aware of a decision is thus not as 
important as some have thought. 


BATTLE OF WILLS 

There are conceptual issues — and then there 
is semantics. “What would really help is if 
scientists and philosophers could come to an 
agreement on what free will means,’ says Glan- 
non. Even within philosophy, definitions of 
free will don't always match up. Some philoso- 
phers define it as the ability to make rational 
decisions in the absence of coercion. Some 
definitions place it in cosmic context: at the 
moment of decision, given everything that’s 
happened in the past, it is possible to reach a 
different decision. Others stick to the idea that 
anon-physical ‘soul is directing decisions. 

Neuroscience could contribute directly to 
tidying up definitions, or adding an empirical 
dimension to them. It might lead to a deeper, 
better understanding of what freely willing 
something involves, or refine views of what 
conscious intention is, says Roskies. 

Mele is directing the Templeton Foundation 
project that is beginning to bring philosophers 
and neuroscientists together. “I think if we do 
a new generation of studies with better design, 
we'll get better evidence about what goes on 
in the brain when people make decisions,” he 


Haggard has Templeton funding for a 
project in which he aims to provide a way to 
objectively determine the timing of conscious 
decisions and actions, rather than rely on 
subjective reports. His team plans to devise 
an experimental set-up in which people play 
a competitive game against a computer while 
their brain activity is decoded. 

Another project, run by Christof Koch, 
a bioengineer at the California Institute of 
Technology in Pasadena, will use techniques 
similar to Fried’s to examine the responses of 
individual neurons when people use reason to 
make decisions. His team hopes to measure 
how much weight people give to different bits 
of information when they decide. 

Philosophers are willing to admit that 
neuroscience could one day trouble the con- 
cept of free will. Imagine a situation (philo- 
sophers like to do this) in which researchers 
could always predict what someone would 
decide from their brain activity, before the 
subject became aware of their decision. “If that 
turned out to be true, that would be a threat to 
free will? says Mele. Still, even those who have 
perhaps prematurely proclaimed the death of 
free will agree that such results would have to 
be replicated on many different levels of deci- 
sion-making. Pressing a button or playing a 
game is far removed from making a cup of tea, 
running for president or committing a crime. 

The practical effects of demolishing free 
will are hard to predict. Biological determin- 
ism doesn't hold up as a defence in law. Legal 
scholars aren't ready to ditch the principle of 
personal responsibility. “The law has to be 
based on the idea that people are responsible 
for their actions, except in exceptional circum- 
stances,” says Nicholas Mackintosh, director of 
a project on neuroscience and the law run by 
the Royal Society in London. 

Owen Jones, a law professor at Vanderbilt 
University in Nashville, Tennessee, who directs 
a similar project funded by the MacArthur 
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Foundation in Chicago, Illinois, suggests that 
the research could help to identify an individ- 
ual’s level of responsibility. “What we are inter- 
ested in is how neuroscience can give us amore 
granulated view of how people vary in their 
ability to control their behaviour,” says Jones. 
That could affect the severity of a sentence, for 
example. 

The answers could also end up influencing 
people's behaviour. In 2008, Kathleen Vohs, a 
social psychologist at the University of Min- 
nesota in Minneapolis, and her colleague 
Jonathan Schooler, a psychologist now at the 
University of California, Santa Barbara, pub- 
lished a study* on how people behave when 
they are prompted to think that determinism 
is true. They asked their subjects to read one 
of two passages: one suggesting that behaviour 
boils down to environmental or genetic factors 
not under personal control; the other neutral 
about what influences behaviour. The par- 
ticipants then did a few maths problems ona 
computer. But just before the test started, they 
were informed that because of a glitch in the 
computer it occasionally displayed the answer 
by accident; if this happened, they were to click 
it away without looking. Those who had read 
the deterministic message were more likely to 
cheat on the test. “Perhaps, denying free will 
simply provides the ultimate excuse to behave 
as one likes,’ Vohs and Schooler suggested. 

Haynes’s research and its possible implica- 
tions have certainly had an effect on how he 
thinks. He remembers being on a plane on his 
way to a conference and having an epiphany. 
“Suddenly [had this big vision about the whole 
deterministic universe, myself, my place in it 
and all these different points where we believe 
we're making decisions just reflecting some 
causal flow.” But he couldn’t maintain this 
image of a world without free will for long. 
“As soon as you start interpreting people's 
behaviours in your day-to-day life, it’s virtu- 
ally impossible to keep hold of; he says. 

Fried, too, finds it impossible to keep deter- 
minism at the top of his mind. “I don't think 
about it every day. I certainly don’t think about 
it when I operate on the human brain” 

Mele is hopeful that other philosophers will 
become better acquainted with the science of 
conscious intention. And where philosophy is 
concerned, he says, scientists would do well to 
soften their stance. “It’s not as though the task 
of neuroscientists who work on free will has to 
be to show there isn’t any.” = 


Kerri Smith is editor of the Nature Podcast, 
and is based in London. 


1. Soon, C.S., Brass, M., Heinze, H.-J. & Haynes, J.-D. 
Nature Neurosci. 11, 543-545 (2008). 

2. Libet, B., Gleason, C. A., Wright, E. W. & Pearl, D. K. 
Brain 106, 623-642 (1983). 

3. Bode, S. et al. PLoS ONE 6, e21612 (2011). 

4. Fried, |., Mukamel, R. & Kreiman, G. Neuron 69, 
548-562 (2011). 

5. Vohs, K. D. & Schooler, J. W. Psychol. Sci. 19, 49-54 
(2008). 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 25 


© 2011 Macmillan Publishers Limited. All rights reserved 


COMMENT 


3S 
CONSERVATION Comparing two comics Richard § POLICY Call to save Turkish OBITUARY Baruj Benacerraf, 
calls to embrace humans’ Feynman in graphic science institute from immunology Nobellist, 
impact on nature p.29 novel form p.32 closure p.33 remembered p.34 


A healthy work-life balance 
can enhance research 


Scientists should make time for play to complement their intense work, maintain 
creativity and Keep the ideas flowing, argues Julie Overbaugh. 


friends and leisure is often hard in 

science. But there must be room for 
those who want this balance, otherwise 
creative people with the potential to make 
significant contributions to scientific dis- 
covery will be excluded. 

In my college years at the University of 
Connecticut in Storrs, I juggled a chemistry 
major with playing on First Division college 
basketball and tennis teams. And, ingrained 
from my Irish upbringing, I felt that college 
was also meant to be a time where some 
evenings were spent in the pub with friends. 
I brought art classes into the mix in graduate 


S triking a balance between work, family, 


school. From these experiences, I learned to 
value efficiency, balance and teamwork, all 
of which have influenced my approach to 
running my research programme. Today I 
direct a US lab of about 15 people and share 
responsibility for a much larger international 
team focused on HIV prevention research. 
During my training, I experienced many 
different lab styles and, in the process, realized 
that I was not cut out for clocking long hours 
for the rest of my life. I decided to try to do 
science in a manner that I could integrate into 
my life, leaving time for an extensive network 
of friends and family and other interests, and 
accept that there would be other paths if this 


one did not work — a crucial mindset for not 
succumbing to a frenzied work life. 

Twenty years on, I have found that 
spending long hours in the lab or at the 
computer does not necessarily promote 
the creative thinking that is integral to sci- 
entific discovery. In fact, I have many of 
my best ideas while walking the dogs in the 
morning, riding my bike home from work 
or weekending in the mountains. 


QUALITY NOT QUANTITY 

Ihave no objection to people who choose to 
do long hours in the lab, but I have also never 
expected lab members to give up other > 
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> aspects of their lives to pursue science. 
One of my team recently noted: “You expect 
high-quality work — but you do not meas- 
ure someone’ efforts by time spent in the lab 
in the evenings or on weekends, rather you 
measure them by the quality of their data.” 

To bea successful scientist there are times 
when it is important to pull out all the stops 
— when a big grant deadline is looming or 
a high-impact paper is wrapping up. Some- 
times, when we are competing with other 
labs on an exciting story, I briefly imagine 
locking everyone in the lab to try to push for 
results more quickly. 

By resisting the temptation to drive my 
group so hard, we might have ended up with 
the second or third paper on a topic ina less 
prestigious journal on occasion. But our 
contributions are clearly recognized. For 
example: I have given plenary talks at every 
major HIV meeting; we have received two 
National Institutes of Health (NIH) MERIT 
awards; former trainees are faculty mem- 
bers at Harvard University in Cambridge, 
Massachusetts, Stanford University in Cali- 
fornia, Baylor College of Medicine in Hou- 
ston, Texas, and many other top research 
institutions; others are successful govern- 
ment scientists at the NIH and the Centers 
for Disease Control and Prevention. 

More importantly, in many cases, we 
solved a problem more effectively, and thus 
gained some advantage, because people in 
the lab were less stressed by long hours, con- 
stant demands and excessive expectations. In 
my view, an unremitting pace with no time 
to step back leads, over the long term, to a 
fatigued and unhappy team that is not oper- 
ating at its best. 


TIME FOR TEA 

For the past two decades, I have worked in a 
highly productive interdisciplinary and inter- 
national collaboration with scientists from 
many cultures. We are committed to reducing 
the burden of HIV, particularly in developing 
countries, and many on the team regularly see 
the devastation of HIV in clinical practice. 

But colleagues also have a range of 
demands on their time, including young 
faculty members who juggle research with 
clinical work and raising children or helping 
ageing parents. The original collaboration 
was driven by the needs of the science; it is 
sustained by esprit de corps. The glue is our 
appreciation of the specific needs and life- 
style choices of each group member, includ- 
ing our trainees. 

Iam also fortunate to work at an institution 
where the focus is on our contributions, not 
our hours. When a colleague recently suffered 
a devastating accident, many faculty members 
and staff were regularly by her hospital bed for 
months, placing their presence there at higher 
priority than their work. Thus, it is note- 
worthy that, among 209 faculty members, 


we have three Nobel laureates and numerous 
others who have received prestigious awards 
(go.nature.com/zebxjt). 

Balancing work and other aspects of life 
is becoming harder in science, as in other 
professions. We are expected to be constantly 
responsive to e-mail, alert to rapid online 
publications, to manage increasing adminis- 
trative and regulatory demands, and to devote 
more and more time to securing funding. 

Tighter funding and ease of travel have 
made ‘being on tour an integral part of being 
the lab chief or principal investigator (PI), 
often to the exclusion of spending time with 
the team. Being a lab head can be more PR 
than PI, with the pressure to ‘sell your goods’ 
on the road. I realized that attending just a few 
meetings a year was enough when I reached 
the point that I had heard my colleagues speak 
so many times that I could have given some of 
their talks myself, and they mine. 

Meetings are wonderful for networking 
and exchanging new findings and ideas, but 
they can have diminishing returns in terms 
of promoting creativity or seeding a break- 
through. We should re-evaluate the expand- 
ing repertoire of conferences in some fields 
to determine if the scientific exchange genu- 
inely offsets the demands of constant travel. 

More generally, time to think deeply 
about scientific problems is becoming 
increasingly rare. Indeed, with the rapid 
advance of new technologies, many studies 
simply apply new methods to old problems, 
often getting a more refined, but similar 
answer. Although it is comforting to see 
the consistency in conclusions, the true 
advances in such cases may be small. 

About 25 years ago, I was the lone trainee 
ina lab of a prominent senior scientist. We 
spent many afternoons drinking tea and 
discussing diverse scientific issues. Cur- 
rent scientific discovery would benefit from 
reinstating this mostly bygone tradition. 
Imagine, further, if labs everywhere shut 
down all experiments for a week each year 
to have everyone read recent — and old — 
literature and discuss ideas over a favourite 
beverage. I predict that new ideas would 
emerge and people would be energized. 

It is a privilege to have work that is 
challenging and satisfying, and that contrib- 
utes to society and helps advance knowledge 
and training. For some, a total focus on 
research and the thrill of discovery is fulfill- 
ing. For others, originality, productivity and 
overall happiness in life demands a different 
mix. There must be room for both types of 
people in science, as each has the potential to 
make important contributions. m SEE EDITORIAL 
P.5 AND NEWS FEATURE P.20 


Julie Overbaugh is in the Division of Human 
Biology, Fred Hutchinson Cancer Research 
Center, Seattle, Washington 98109, USA. 
e-mail: joverbau@fhcre.org 
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Redefining nature 


Shahid Naeem compares two books that call for us to 
embrace the influence of humans on ecosystems. 


stein assembled parts of dead humans 

into a creature he brought to life by 
unorthodox scientific methods. The resulting 
being, although human in form and function, 
was seen as a monster. If we were to assem- 
ble a thing made up of plants, animals and 
microbes, and then breathe ecological life 
into it, would it be nature or an unnatural 
monster? Two authors examine this issue: 
Emma Marris in Rambunctious Garden and 
Nigel Dudley in Authenticity in Nature. 

If one defines ‘natural’ nature as assem- 
blages of native species that occupy large, 
unbroken areas without obvious human 
influence, and that exhibit normal (1,000- 
year average) rates of extinction, origination 
and ecosystem function, then there is no such 
thing on Earth today. The destruction of the 
natural world began some 2 million years 
ago, with burning and hunting by Homo erec- 
tus. In the past 50,000 years, Homo sapiens 
took these skills to new heights, culminating 
in the Industrial Revolution, in which some 


[: Mary Shelley’s novel, Victor Franken- 


Rambunctious Garden: Saving Nature ina 
Post-wild World 

EMMA MARRIS 

Bloomsbury: 2011. 288 pp. £20, $25 


Authenticity in Nature: Making Choices 
about the Naturalness of Ecosystems 
NIGEL DUDLEY 

Earthscan: 2011. 256 pp. £19.99, $34.95 


25 million square kilometres of grasslands 
were burned, 12 million square kilometres of 
forests vanished, and floral and faunal extinc- 
tions skyrocketed worldwide. Ifthe loss of the 
natural world is nothing new, does it matter? 
Both authors think not. 

Dudley opens Authenticity in Nature by 
recounting his boyhood experiences at the 
Attenborough Nature Reserve near Notting- 
ham, UK. Now leased by a mining company 

to a wildlife trust, it 


NATURE.COM was for centuries graz- 
See Nature’s ing land. To visitors, 
biodiversity special: it is nature. Marris 
go.niature.com/wexhgi goes a step further in 
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Rambunctious Garden, suggesting that birds 
and bees on New York's Fifth Avenue consti- 
tute nature. I could go one better: my forma- 
tive childhood experiences of nature were 
the dioramas of the American Museum of 
Natural History in New York. My mind’s eye 
breathed life into the stuffed, encased crea- 
tures; for me, they defined nature. 

Both authors review what is natural and 
unnatural from an ecological perspective, 
but their approaches are different. Marris, 
an accomplished science writer who regu- 
larly writes for Nature, follows the tactic of 
David Quammen (The Song of the Dodo; 
1996), Stuart Pimm (The World According 
to Pimm; 2001) and Jonathan Weiner (The 

Beak of the Finch; 


“Thereis neither 1994), In this, her 
coherence nor first book, she 
consensus as to brings together her 
what constitutes many travels and 
naturalness or encounters to cre- 


ate a Kerouac-like 
journey through 
which her thesis emerges. She posits that sig- 
nificant parts of the modern world are or will 
become “rambunctious gardens” — unruly 
entanglements of weedy species that follow 
in our wake. Because nature is ever changing, 
on the scale of Earth’s history, rambunctious 
gardens are as legitimate as any other mani- 
festation of nature. We should embrace our 
creations, not shun them as monsters. 

Marris makes effective use of the popu- 
lar voice, but sometimes goes overboard. 
Phrases such as “North American mam- 
moths went kaput” do injustice to the gravity 
of such issues. Similarly, frequent references 
to “ecologists, as in her claim that most ecol- 
ogists dislike anything that “reeks of man- 
kind” belies the broad spectrum of views 
that such researchers hold. 

Dudley’s book is more sombre, similar 
to Simon Levin's Fragile Dominion (1999), 
E. O. Wilson’s The Future of Life (2002) 
or Carolyn Merchant’s Reinventing Eden 
(2003). An established environmental sci- 
entist, Dudley also uses his travels to bring 
colour to his writing, but his hypothesis 
comes from a thoughtful examination of 
various attempts to define ‘natural’ and 
‘wild’ — by scientists, philosophers, man- 
agers, non-governmental organizations 
and policy-makers. Dudley demonstrates 
that there is neither coherence nor consen- 
sus as to what constitutes naturalness or 
wildness. He suggests that we focus instead 
on “authenticity”. 

For nature to be authentic, Dudley posits, 
it need only contain a web of interacting spe- 
cies that provides stable ecosystem functions 
and services. The islands of Hawaii, for exam- 
ple, are inhabited by humans, have suffered 
innumerable extinctions and are riddled with 
non-native species. Yet they remain complex 
and productive ecosystems — not natural, > 


wildness.” 
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> perhaps, but authentic Hawaiian ones. 
Being authentic does not involve having 
native or endemic species or being devoid 
of people, and is thus a more tractable 
environmental goal than achieving natu- 
ralness or wildness. 

Neither volume tries to dismiss ‘natural 
nature’ as the cause célébre of conser- 
vation; rather, both encourage adding 
unnatural nature to that which we seek 
to preserve. Well protected places that 
are rich in endemics are important, but 
rambunctious gardens and authentic 
ecosystems are crucial too. 

Take a modern New England forest. Its 
ecology consists of garlic mustard from 
Europe, thorny Japanese barberry, intro- 
duced earthworms, soil that is enriched 
with industrial nitrogen, exotic insect 
pests such as hemlock woolly adelgids, 
Asian longhorned beetles and emerald 
ash borers, and an ungodly number of 
ravenous deer who know no serious 
predation. It isa rambunctious garden. 
It is polluted, stunted, diseased, unsta- 
ble and has no top predators. But it is 
teeming with life. It is home to surviving 
native species; exhibits ecosystem func- 
tions such as storing carbon and cycling 
nutrients; and even provides some eco- 
system services by stabilizing the hilly 
slopes and supplying deer for hunters in 
the region. This is authentic nature in the 
Anthropocene epoch. 

Yet ecological theory still holds, even if 
it is complex and inconvenient. If nature 
is any set of interacting species, through 
which energy flows and nutrients cycle, 
then it calls for saving too many species 
and setting aside too much land. Yet I 
would caution against making intelli- 
gent-design-like arguments that would 
dispense with ecology, to replace it with 
something simpler. Nature without ecol- 
ogy is like biology without evolution; 
neither is viable, neither makes sense. 

Marris and Dudley challenge us to 
revisit the definition of nature in our 
increasingly unnatural world. But mod- 
ern ecosystems that are haphazardly 
assembled from the remains of human 
development are unpredictable and frag- 
ile. With one billion people hungry, two 
billion poor and three billion in desper- 
ate need of water, the hopes of human- 
ity rest on conserving, restoring and 
sustainably managing the services that 
nature provides. The bigger question is 
whether the unnatural nature that we 
have wrought, although familiar in form 
and function, will save us or prove to be 
monstrous. m 


Shahid Naeem is professor of ecology at 
Columbia University, New York, USA. 
e-mail: sn2121@columbia.edu 
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Building the Large Hadron Collider proved tricky, not least because of fears it would create tiny black holes. 
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Inside the collider 


Joseph Silk enjoys an eloquent take on the Higgs boson, 
supersymmetry and the world’s largest particle smasher. 


entioning particle physics may 
Mie many dinner parties, but 

that has not deterred its funders. 
By the end of 2010, more than €7 billion 
(US$10 billion) had been ploughed into the 
current world-leading machine in experi- 
mental particle physics — the Large Hadron 
Collider (LHC) at CERN, Europe's high- 
energy physics lab near Geneva, Switzerland. 
So it behooves the researchers involved to 
communicate the relevance of the LHC’s sci- 
ence goals to the public. 

Lisa Randall’s Knocking on Heaven's Door is 
the latest attempt to do so. Her eloquent book 
details the trials and tribulations of the LHC, 
from conception to implementation, and 
takes us on a grand tour of the underlying 
science. Randall, a professor of physics at Har- 
vard University in Cambridge, Massachusetts, 
anda leading contributor to particle-physics 
theory, borrows her title from Bob Dylan’s 
soundtrack to the 1973 Sam Peckinpah film, 
Pat Garrett and Billy the Kid. The film is a 
lament on the death of a gunslinger — and 
the book’s title may be a reference to the pre- 
diction that turning on the LHC would result 
in the destruction of Earth. Fortunately, as 
Randall describes, this did not happen. 

That prediction provides a measure of the 
LHC’ reach, and its hold on the public’s imag- 
ination. Physicists’ ulti- 
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scopic objects could 
be recreated in suffi- 
ciently high-energy particle collisions with a 
powerful particle accelerator such as the LHC. 
However, most particle physicists doubt that 
they will actually see such events — Stephen 
Hawking predicted that, owing to quantum 
physical effects, microscopic black holes 
should decay in a fraction of a nanosecond. 
But even Hawking might be fallible. Rich- 
ard Feynman famously said that “nobody 
understands quantum mechanics”. If 
Hawking was wrong, an escaping black hole 
might suck up its surroundings: the LHC 
itself and Geneva (to which humanity could 
no doubt adapt) and even Earth. Pursuing 
this logic, a teacher in Hawaii combined 
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> perhaps, but authentic Hawaiian ones. 
Being authentic does not involve having 
native or endemic species or being devoid 
of people, and is thus a more tractable 
environmental goal than achieving natu- 
ralness or wildness. 

Neither volume tries to dismiss ‘natural 
nature’ as the cause célébre of conser- 
vation; rather, both encourage adding 
unnatural nature to that which we seek 
to preserve. Well protected places that 
are rich in endemics are important, but 
rambunctious gardens and authentic 
ecosystems are crucial too. 

Take a modern New England forest. Its 
ecology consists of garlic mustard from 
Europe, thorny Japanese barberry, intro- 
duced earthworms, soil that is enriched 
with industrial nitrogen, exotic insect 
pests such as hemlock woolly adelgids, 
Asian longhorned beetles and emerald 
ash borers, and an ungodly number of 
ravenous deer who know no serious 
predation. It isa rambunctious garden. 
It is polluted, stunted, diseased, unsta- 
ble and has no top predators. But it is 
teeming with life. It is home to surviving 
native species; exhibits ecosystem func- 
tions such as storing carbon and cycling 
nutrients; and even provides some eco- 
system services by stabilizing the hilly 
slopes and supplying deer for hunters in 
the region. This is authentic nature in the 
Anthropocene epoch. 

Yet ecological theory still holds, even if 
it is complex and inconvenient. If nature 
is any set of interacting species, through 
which energy flows and nutrients cycle, 
then it calls for saving too many species 
and setting aside too much land. Yet I 
would caution against making intelli- 
gent-design-like arguments that would 
dispense with ecology, to replace it with 
something simpler. Nature without ecol- 
ogy is like biology without evolution; 
neither is viable, neither makes sense. 

Marris and Dudley challenge us to 
revisit the definition of nature in our 
increasingly unnatural world. But mod- 
ern ecosystems that are haphazardly 
assembled from the remains of human 
development are unpredictable and frag- 
ile. With one billion people hungry, two 
billion poor and three billion in desper- 
ate need of water, the hopes of human- 
ity rest on conserving, restoring and 
sustainably managing the services that 
nature provides. The bigger question is 
whether the unnatural nature that we 
have wrought, although familiar in form 
and function, will save us or prove to be 
monstrous. m 


Shahid Naeem is professor of ecology at 
Columbia University, New York, USA. 
e-mail: sn2121@columbia.edu 
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Building the Large Hadron Collider proved tricky, not least because of fears it would create tiny black holes. 


PARTICLE PHYSICS 


Inside the collider 


Joseph Silk enjoys an eloquent take on the Higgs boson, 
supersymmetry and the world’s largest particle smasher. 


entioning particle physics may 
Mie many dinner parties, but 

that has not deterred its funders. 
By the end of 2010, more than €7 billion 
(US$10 billion) had been ploughed into the 
current world-leading machine in experi- 
mental particle physics — the Large Hadron 
Collider (LHC) at CERN, Europe's high- 
energy physics lab near Geneva, Switzerland. 
So it behooves the researchers involved to 
communicate the relevance of the LHC’s sci- 
ence goals to the public. 

Lisa Randall’s Knocking on Heaven's Door is 
the latest attempt to do so. Her eloquent book 
details the trials and tribulations of the LHC, 
from conception to implementation, and 
takes us on a grand tour of the underlying 
science. Randall, a professor of physics at Har- 
vard University in Cambridge, Massachusetts, 
anda leading contributor to particle-physics 
theory, borrows her title from Bob Dylan’s 
soundtrack to the 1973 Sam Peckinpah film, 
Pat Garrett and Billy the Kid. The film is a 
lament on the death of a gunslinger — and 
the book’s title may be a reference to the pre- 
diction that turning on the LHC would result 
in the destruction of Earth. Fortunately, as 
Randall describes, this did not happen. 

That prediction provides a measure of the 
LHC’ reach, and its hold on the public’s imag- 
ination. Physicists’ ulti- 
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scopic objects could 
be recreated in suffi- 
ciently high-energy particle collisions with a 
powerful particle accelerator such as the LHC. 
However, most particle physicists doubt that 
they will actually see such events — Stephen 
Hawking predicted that, owing to quantum 
physical effects, microscopic black holes 
should decay in a fraction of a nanosecond. 
But even Hawking might be fallible. Rich- 
ard Feynman famously said that “nobody 
understands quantum mechanics”. If 
Hawking was wrong, an escaping black hole 
might suck up its surroundings: the LHC 
itself and Geneva (to which humanity could 
no doubt adapt) and even Earth. Pursuing 
this logic, a teacher in Hawaii combined 
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forces with a Spanish writer in 2008 to file a 
lawsuit against CERN, the US Department of 
Energy and the US National Science Foun- 
dation that threatened to block the start-up 
of the LHC. 

As Randall describes, scientists responded 
with fervour. It turns out that nature pro- 
vides an answer to such concerns. Cosmic 
rays pervade space and bombard Earth 
continuously. Their energies extend to bil- 
lions of times that achievable by the LHC. 
Had microscopic black holes been created 
in high-energy collisions of cosmic-ray par- 
ticles, Earth and the stars would have been 
swallowed up long ago. Physicists could 
relax: the LHC risk-assessment exercise was 
favourably resolved. 

On 20 November 2009, the LHC first 
powered up for experiments. By the end of 
2012 it will reach a high enough energy to 
test the standard model of particle physics 
and to detect the Higgs boson, the elusive, 
mass-giving ‘God particle’ — if it exists. 
Knocking on Heaven’ Door describes how 
that discovery would confirm one of the 
great predictions of physics. In parallel, 
the LHC will search for physics beyond the 
standard model. One of the most anticipated 
signatures will be that of supersymmetry, a 
new field that provides a candidate particle 
for dark matter. 

Given her background, Randall naturally 
complements her discussion of the LHC by 
describing ongoing searches for dark matter 
that are mostly led by particle physicists. For 
them, the driving question is: what is it? But 
Randall largely ignores astronomers’ con- 
tribution to the problem — namely, giving 
the empirical motivation for dark matter (it 
is the dominant form of matter in the Uni- 
verse) and mapping its location. 

The LHC could resolve the greatest mys- 
teries of the Universe: one microscopically 
small, and the other macroscopically large. 
But suppose physicists fail to detect any sign 
of the Higgs boson or supersymmetry? Will 
we have wasted those billions? Failure would 
shift the goal posts. Exploration of the next 
particle-physics frontier will require more 
powerful, more expensive and less attain- 
able machines. But we would also be unsure 
as to how high we would need to go in terms 
of energy or luminosity to achieve a break- 
through in new physics. Visionary ideas 
would be needed. 

Let us hope that the LHC does find some- 
thing. And that, regardless of the outcome, 
the inspired efforts of its builders will com- 
bine with theorists’ dreams to develop new 
and affordable probes of the ultimate hori- 
zons of the Universe. = 


Joseph Silk is professor of physics at the 
Institut d'Astrophysique, Université Pierre et 
Marie Curie, Paris, France. 

e-mail: silk@astro.ox.ac.uk 


Books in brief 


Survivors: The Animals and Plants that Time Has Left Behind 

er, Richard Fortey HARPER PRESS 400 pp. £25 (2011) 
Cataclysms come and go, but the stromatolites of Western Australia 
have sat them out for more than 2 billion years. These organic 
cushion-like structures with cyanobacterial wigs lead palaeontologist 
Richard Fortey’s cast of survivors still dangling from the tree of life. 
He roves from hordes of horseshoe crabs in Delaware Bay on the 
northeast US coast to New Zealand’s velvet worms and beyond, each 
fascinating organism a focus for broader thoughts on evolutionary 
y history. Decades spent “looking at thoroughly dead creatures” have 

not dimmed Fortey’s ability to bring these relics to life. 


My Beautiful Genome: Discovering Our Genetic Future, One Quirk 
ataTime 

Lone Frank ONEWORLD 320 pp. £10.99 (2011) 

As consumer-led genomics ramps up, questions of ethics and efficacy 
proliferate. Neurobiologist Lone Frank looks at how exposing our 
DNA affects our lives. Having interviewed James Watson and covered 
the rise of personal genomics from 2008, Frank puts her own genes 
to the test. She charts the range of applications — deep ancestry, 
disease, behaviour and personality, mental illness and partner 
compatibility — and concludes that, far from being a straitjacket, 
unveiling our ‘invisible self liberates, connects and reassures. 


1493: How Europe’s Discovery of the Americas Revolutionized 
Trade, Ecology and Life on Earth 

Charles C. Mann GRANTA 544 pp. £14.99 (2011) 

Journalist Charles Mann chronicles how Christopher Columbus’ 
second New World expedition in 1493 triggered a global upheaval. 
European vessels left sheep, rats and lethal viruses in the New World 
and carried tomatoes, tobacco and maize (corn) to the Old. Millions 
of people died from introduced diseases and ecosystems convulsed. 
Aworld economy emerged, propelled by trade in commodities from 
silk to slaves. Drawing on new research, Mann reframes the past 
500 years to riveting effect. 


The Genius in my Basement: The Biography of a Happy Man 
Alexander Masters FOURTH ESTATE 352 pp. £8.99 (2011) 

In 2007, writer Alexander Masters — author of Stuart: A Life 
Backwards (2006) — lived above the distinguished mathematician 
Simon Phillips Norton in Cambridge, UK. Norton helped to devise 
the ‘monstrous moonshine’ conjecture, about a mathematical 
symmetry group in thousands of dimensions known as the Monster; 
he is also an eccentric who obsesses about buses and Bombay mix. 
Masters, with his background in maths and physics, has written a 
fond yet merciless portrait that attempts both to dissect the Monster 
and to do justice to an extraordinary character. 


The Quest for Frank Wild 

Angie Butler JACKLEBERRY PRESS 224 pp. £25 (2011) 

Antarctic exploration is synonymous with heroes such as Ernest 
Shackleton, Robert Falcon Scott and Roald Amundsen. Few of us 
have heard of Frank Wild, Shackleton’s ‘right-hand man’, who had 
pivotal roles in five Antarctic expeditions and is one of only two men 
to be awarded a Polar Medal with four bars. After seven years tracking 
Wild’s fate, writer Angie Butler redresses the balance. Her account of 
his life includes a coup: Wild’s memoir of four expeditions, including 
Nimrod and Endurance, is published here for the first time. 
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| COMMENT | BOOKS & ARTS 


BECAUSE, 
YOU KNOW 
WHEN 1 SEE 

EQUATIONS, I SEE 

LETTERS IN COLORS. 

I DON'T KNOW 


BUT I 
WONDER 
WHAT THE HELL 
THIS STUFF 
LOOKS LIKE TO 
MY STUDENTS. 


Jim Ottaviani’s comic-strip biography of Richard Feynman conveys the physicist’s colourful personality. 


Q&A Jim Ottaviani 
Comic creator 


Jim Ottaviani is the author of several comic books about famous scientists. His latest, with 
illustrations by Leland Myrick, covers the life of physicist Richard Feynman, who is known 
for his bongo playing and enthusiastic lectures as much as his work on quantum mechanics. 
Ottaviani explains why a graphic-novel format is a perfect match for such a zany character. 


Why did you decide 

to write comic books 
about science? 

In 1997 I was looking 
for comics I wanted 
to read. I love a good 
Spider-Man story, but 
after you have read 
a hundred you don't 
really need another. I 
had been a nuclear engineer, then worked as 
a librarian. Interested in the names behind 
the equations and discoveries, I read the 
biographies of physicist Niels Bohr and 
others. There were some great stories there. 
That got me wondering: why aren't they in 
comics? 


Why did you choose Richard Feynman? 

Every pantheon needs a trickster god. We 
are now far enough removed from the 
early foundations of quantum physics that 


we mythologize the people and events 
involved. And we like to have characters 
in our mythologies. Feynman slots beauti- 
fully into that. He worked hard to make his 
personality accessible to a broad audience, 
when many of his peers did not. Feynman 
was astute, even aggressive, about creating 
a myth about himself, turning his life into a 
sort of performance art. 


Was Feynman’s interest in drawing and 

diagrams also relevant? 

With a scientific audience, sometimes 

you have to defend the use of images. Flip 

through Physical Review: there are a lot of 
pictures. We commu- 
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him about the beauty of a flower being 
accessible to both the artist and the scientist 
in different ways, and how they add to each 
other. So the comic format is valid. 


Why did you decide not to present his life 
chronologically? 

We asked: what would Feynman do? In his 
books, Feynman wrote in short scenes and 
acts, presenting some things out of order. 
A continuous narrative — born, lived, died 
— did not serve the person or the story. So 
using anecdotes seemed natural. Choos- 
ing the anecdotes, choosing when to break 
the continuity, that was when it got more 
difficult. Our intention was to make him 
seem more human. 


The words and pictures in your book tell 
different aspects of the story. Why did you 
choose such a challenging format? 

Leland and I assumed that the readership 
would be willing to read both the words and 
the pictures. We wanted to bring something 
new, to enrich the experience. Otherwise, 
why do another book about Feynman? We 
wanted to go beyond James Gleick’s 1992 
book Genius or Feynman's own stories. 


What is the subject of your next book? 

It is aimed at a young readership, and is 
about primatologists Jane Goodall, Dian 
Fossey and Biruté Galdikas, with a good 
helping of their mentor Louis Leakey. The 
book is targeted at readers aged between 10 
and 12, mainly girls. It is nowhere near as 
in-depth as Feynman, because it is about 
half the length and covers three people. It 
is about becoming a scientist, and what it 
means to be one in a world that may not be 
prepared for you to do that sort of thing, as 
a woman. 


Do you also touch on that issue in the 
Feynman book? 

Yes, with Feynman’s sister, Joan Feynman. 
We give a hint of the difficulties she faced in 
pursuing her career in astrophysics. 


What are you trying to show in your books? 
Feynman created a world in science that 
he enjoyed living in. Then he realized that 
he could have more by creating another 
world, where he appreciates art, plays the 
bongos, acts in plays or sometimes acts the 
fool. And this is what I have been trying to 
show with all the books — the humanness 
of science. Science is a serious endeavour. 
It is the old Spider-Man saying: “With great 
power comes great responsibility.” But at 
the same time, scientists should appreciate 
the joy of it. I hope scientists will get from 
Feynman the idea of a full life, lived 
creatively, and the fun it can produce. = 
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Loophole in forest 
plan for Indonesia 


Last year, Indonesia and Norway 
signed the Oslo Pact, which will 
pay Indonesia up to US$1 billion 
to reduce carbon emissions by 
advancing forest-conservation 
initiatives. As part of the deal, 
Indonesia must halt the licensing 
of new agricultural plantations 
and logging concessions on 
peatlands and natural forest for 
two years. Clearing and logging 
must instead be directed to non- 
forest ‘degraded lands and to 
existing concessions. But the pact 
has a big loophole. 

Indonesia is the world’s third- 
largest emitter of greenhouse 
gases, caused mostly by 
rampant felling or burning of its 
rainforests and carbon-rich peat- 
swamp forests. The loss of these 
ecosystems also threatens major 
hot spots of global biodiversity. 
The hope is that the Oslo Pact 
and follow-on carbon payments 
can stem this tide. 

However, President Susilo 
Bambang Yudhoyono of 
Indonesia has issued a two-year 
moratorium on new concessions 
for clearing or logging of 
peatlands and natural primary 
(old-growth) forest. Contrary to 
the Oslo Pact, vast expanses of 
selectively logged forests — which 
sustain substantial carbon stores 
and much biodiversity — are 
classed as ‘degraded’ and left out 
of the moratorium altogether. 
The net effect is that these natural 
forests could be re-logged or 
cleared for oil palm and pulpwood 
plantations. According to its 
Ministry of Forestry, Indonesia 
has 35.4 million hectares of 
logged forest that can be cleared, 
considerably more than the upper 
estimate of 20 million hectares of 
primary forest protected under 
the moratorium. 

Many protected forests are in 
steep, mountainous areas that face 
little threat. The most imperilled 
forests, in the lowlands, are largely 
excluded from the deal because 
they have been logged previously. 
On top of this, the moratorium 
fails to protect shallow peatlands 


from conversion, or halt primary 
forests and deep peatlands from 
being cleared for sugar cane — 
one of the most rapidly expanding 
biofuel crops. 

We urge Norway to insist that 
logged forests and clearance for 
sugar cane be included under the 
moratorium. Without doing so, 
this is little more than business as 
usual in Indonesia. 

David P. Edwards, William FE. 
Laurance James Cook University, 
Cairns, Queensland, Australia. 
dave.edwards@jcu.edu.au 


Call to save science 
institute in Turkey 


On 15 July, the Turkish Scientific 
and Technological Research 
Council (TUBITAK) effectively 
closed down the Feza Giirsey 
Institute for Basic Sciences in 
Istanbul by relocating it toa 
TUBITAK cryptology institute 
in Gebze. More than 1,500 
signatures were collected by 
mid-August to ask the science 
minister, Nihat Ergtin, to 
reconsider this decision. 

The Feza Giirsey Institute, 
named after an eminent Turkish 
physicist, has been crucial in the 
training of Turkish researchers. 
In the words of Marta Sanz-Solé, 
president of the European 
Mathematical Society, it is central 
to the “consolidation of scientific 
international collaborations”. 

The institute has a remarkable 
research record in theoretical 
physics and mathematics that 
spans 14 years, with 350 articles 
published in high-profile journals 
and more than 2,000 citations. It 
has hosted international meetings 
and free summer schools for 
thousands of Turkish participants. 

The move seems to be 
an example of TUBITAK’s 
apparently low rating of basic 
research and its relation to 
applied research and technology. 

Signatories in the campaign 
to save the institute include 
more than 100 prominent 
physicists and mathematicians, 
as well as the presidents of 
the US, European and French 


mathematical societies, and 

of Turkey’s Mathematical 
Society, Physical Society and 
Astronomical Society (http:// 
savefezagursey.wordpress.com). 
Ayse Erzan Istanbul Technical 
University, Turkey. 
erzan@itu.edu.tr 

Cihan Sachoglu Sabanci 
University, Turkey. 


Drug firm monitors 
waste water 


At AstraZeneca we are proactively 
addressing the problem of 
pharmaceuticals entering the 
environment as a result of our 
manufacturing discharges 
(Nature 476, 265; 2011). 

Using ecotoxicity data and our 
knowledge of environmental 
fate and the local environment, 
we have identified long- and 
short-term concentrations of 
active pharmaceutical ingredients 
that we refer to as Environmental 
Reference Concentrations 
(ERCs) and Maximum Tolerable 
Concentrations (MTCs), 
respectively. These should not 
be exceeded in the aquatic 
environments that receive effluent 
from our manufacturing sites. 

This approach is based on 
established environmental 
quality standards used in national 
and international legislation. 
Under this voluntary initiative, 
we have so far established ERCs 
and MTCs for 30 of our active 
pharmaceutical ingredients 
to protect the freshwater 
environment (algae, invertebrates 
and fish), top predators (fish- 
eating mammals such as otters), 
the marine environment 
(for coastal discharges) and 
humans. Other research-based 
pharmaceutical industries also 
have voluntary initiatives to 
control their discharges. 

We have ‘ecopharmacovigilance 
procedures in place to ensure 
that our ERCs continue to 
take into account all relevant 
data and current scientific 
understanding of the fate and 
effects of pharmaceuticals in 
the environment. For example, 


ifa new, lower, no-effect 
concentration is reported in the 
peer-reviewed literature and is 
scientifically robust, we revise 
our ERCs and environmental risk 
assessments accordingly. 

Last year, we started a 
programme to monitor our own 
emissions against ERC and MTC 
values for our worldwide sites 
that could discharge waste water 
containing active pharmaceutical 
ingredients during peak 
production. We are now starting 
to share the ERC approach with 
our third-party manufacturers, 
with a view to including them in 
the programme. 

Jason Snape, Chris Lewis, 
Richard Murray-Smith 
AstraZeneca, Brixham 
Environmental Laboratory, UK. 
richard.murray-smith@ 
astrazeneca.com 

Competing financial interests 
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Diploma database to 
encourage mobility 


Researchers moving abroad often 
need to have their qualifications 
recognized by a local university 
or other national institution. 
This costly process can take 
months, and may include thesis 
re-evaluation by a panel of 
professors or researchers. Add 
to this the extensive paperwork 
already required for applications 
for jobs, research grants or 
scholarships, and the need to cut 
red tape to encourage mobility 
within the scientific community 
becomes more pressing. 

Governments and universities 
should create an international 
online database that details the 
qualifications of applicants from 
foreign universities, together with 
a summary of the work entailed 
and the standard expected. 

Decisions by institutions 
responsible for recognizing 
diplomas awarded abroad would 
then be just a few clicks away. 
Ana M. C. Santos Federal 
University of Goids, Goidnia, 
Brazil. ana.margarida.c.santos@ 
googlemail.com 
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OBITUARY 


Baruj Benacerraf 


(1920-2011) 


Immunologist who won Nobel for genetics of T-cell antigen recognition. 


aruj Benacerraf bestrode immunol- 
B ogy for five decades. He created the 

intellectual framework leading to 
our present understanding of how T lym- 
phocytes recognize antigens, for which he 
received the 1980 Nobel Prize in Physiology 
or Medicine. He crafted world-renowned 
centres of immunology at multiple institu- 
tions, oversaw the flowering of a premier 
cancer centre, and was a remarkable mentor 
to generations of immunologists. He died, 
aged 90, on 2 August 2011. 

Born in 1920 in Caracas, Venezuela, to 
Sephardic Jewish parents, Benacerraf spent 
his youth in Paris and emigrated to the 
United States in 1940. He did his under- 
graduate studies as a premedical student at 
Columbia University in New York, where 
he met Annette Dreyfus, also a French Jew- 
ish émigré. They married in 1943, a love 
match that ended only in June this year with 
Annette’s death. She travelled everywhere 
with him and was a constant presence in the 
lab, softening his rough edges, reminding 
him to take his medicines, pampering him 
with his favourite French cookies and tea. 

After completing his MD at the Medical 
College of Virginia in Richmond, Benacer- 
raf served in the US Army in France and 
returned to New York City in 1947. He began 
his scientific career in 1948 at Columbia with 
Elvin Kabat, a leading figure in immuno- 
chemistry. Family business responsibilities 

took him to Paris in 1949 where, at the 
Broussais Hospital, he carried out ground- 
breaking studies on the ‘reticuloendothelial 
system, then the term for the system of white 
blood cells (phagocytes) that ingest foreign 
particles and cell debris. Recruited to New 
York University (NYU) in 1956, he became 
the intellectual centre of a distinguished 
group of scientists and began the training 
activities that were to be one of the great 
achievements of his life. 

At NYU, Benacerraf proved to be an 
immunological polymath, uncovering the 
existence and diverse functions of antibody 
subclasses, identifying cellular receptors for 
these proteins and establishing the distinc- 
tive antigen-recognition properties of B and 
T lymphocytes. He also continued the theme 
of combining business and scientific respon- 
sibilities, serving as a bank director for most 
of his NYU tenure. 

His great work was the discovery that 
in a population of outbred guinea pigs, 
immune responses to simple antigens could 


be mounted by some animals but not by 
others. He showed that responsiveness was 
controlled by a genetic locus that determined 
whether the immune system could perceive 
the material and generate a functional 
response — a fundamental insight that led 
to his Nobel prize. These immune response 
(Ir) genes were later shown to be linked to the 
major histocompatibility complex (MHC) 
— which contains many immunity-related 
genes — and were eventually determined to 
code for the MHC molecules themselves. 


Benacerraf left NYU in 1968 and led the 


National Institute of Allergy and Infectious 
Diseases’s Laboratory of Immunology in 
Bethesda, Maryland, for two years, where 
his studies led to a fuller understanding of 
how Ir gene products mediate immunity. 
He also took the first steps that led to the 
laboratory becoming a major centre of 
immunology research. In 1970, he became 
chair of the pathology department at 
Harvard Medical School in Boston, Mass- 
achusetts, a position he held until 1991. 
On his arrival, the faculty included some 
immunological luminaries, but the school 
was not the world-leading centre of immu- 
nology it is today. Benacerraf recruited 
outstanding immunologists to various Har- 
vard hospitals and centres in addition to the 
‘Quadrangle’ faculty he led. His own efforts 
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to understand the significance of T-cell 
recognition of peptide MHC complexes 
flourished and he continued his extraordi- 
nary record of training young scientists who 
later would become leaders in the field. In 
1980, Benacerraf was appointed president 
of the Sidney Farber Cancer Institute in 
Boston, now the Dana-Farber Cancer Insti- 
tute. His leadership transformed it into one 
of the premier cancer centres and medical 
research organizations in the world. He 
stepped down from the presidency in 1992. 

Those of us who passed through his labora- 
tory at Harvard were invited to spend idyllic 
summer days at Woods Hole in the time 
before e-mail and the web. We shared ocean 
breezes and excellent food with Baruj and 
Annette and had intense scientific discussions 
that he expected to be translated into concrete 
experiments immediately on our return. 

A rare combination of capacities made 
Benacerraf such a successful scientist and 
leader. His joy at uncovering some new aspect 
of the immune system was almost childlike: 
his impish smile, finger snapping and jig- 
ging footwork at ‘Aha!’ moments were well 
known to his colleagues and trainees. He had 
enormous native intelligence, remarkable sci- 
entific intuition and a tremendous capacity 
to recognize latent talent, as well as to inspire, 
motivate and guide the careers of those in 
whom he identified such potential. It was 
sometimes uncomfortable in the moment 
when one’s ‘buttons’ were being pushed, but 
in retrospect the value of such productive 
manipulation was always apparent. 

Indeed, Benacerraf was aman of enormous 
personal warmth — an attribute not always 
appreciated by those outside his close 
scientific family. As pleased as he was with 
his achievements in science and scientific 
administration, he took every opportu- 
nity to remind those close to him that his 
main pride was in the cadre of scientists 
he trained. Their continuing scientific 
achievements are his greatest legacy. = 
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How to escape treatment 


Even during effective treatment with antiretroviral drugs, low levels of HIV persist. In part, this could be due to cell-to-cell 
transfer of multiple virions and the drugs’ inability to inhibit replication when virus levels are high. SEE LETTER P.95 


STEVEN G. DEEKS 


ne of the great triumphs of modern 
() medicine is the development of com- 

bination antiretroviral drug therapy 
for the management of HIV infection. For 
those with access to these drugs, modern regi- 
mens can reduce the amount of circulating 
virus to very low levels. Despite their inherent 
potency, however, antiretroviral drugs are not 
curative, and HIV persists indefinitely. Con- 
sequently, patients have to adhere to these 
expensive and potentially toxic drugs for life. 
On page 95 of this issue, Sigal and colleagues’ 
provide insights into why antiretroviral drugs 
cannot fully inhibit HIV replication. Their data 
have notable implications for the emerging 
efforts aimed at curing HIV infection’. 

Several mechanisms might contribute 
to HIV persistence during antiretroviral 
therapy. These include maintenance of a 
transcriptionally silent (latent) HIV genome 
in long-lived, resting target cells such as CD4* 
memory T cells; proliferation of these latently 
infected cells; inadequate anti-HIV clearance 
mechanisms; and ongoing de novo infection 
of susceptible target cells through continued 
viral replication**. Although most research- 
ers agree on the involvement of the first three 
mechanisms, the degree to which HIV can 
effectively replicate during therapy is a highly 
contentious issue. 

The arguments against persistent replication 
are that the virus does not evolve in peripheral 
blood, and that following intensification — the 
process whereby potent drugs are added to a 
stable regimen — the steady-state levels of HIV 
RNA in the plasma do not change’. 

Nevertheless, several lines of evidence sup- 
port ongoing replication. Preliminary studies*” 
suggest that intensification may affect levels of 
the virus in tissues known to be enriched in 
HIV-susceptible target cells; the rapid reduc- 
tion in virus levels in response to an antiret- 
roviral drug can most easily be explained by 
the inhibition of the ongoing complete cycles 
of replication. Also, cells of many treated indi- 
viduals contain unintegrated episomal HIV 
DNA (a potential marker of recent cell infec- 
tion)*. Finally, activated cells contain higher 
levels of HIV DNA than resting cells do’ — an 
observation that is more readily explained by 


Other 


Antiretroviral 
drugs 


Figure 1 | HIV infection of target cells. HIV infects CD4* memory T cells, which are scattered 
throughout the body. a, Where these cells are sparse (for instance, at effector sites in the mucosa) 
successful infection is likely to involve cell-free transfer of single virions (blue pyramids) to distant 

target cells (arrow). b, But in areas of high T-cell density (such as inductive tissues in lymph nodes), 
transmission is likely to involve direct transfer of multiple virions into adjacent cells. Sigal and colleagues’ 
findings’ suggest that antiretroviral drugs (small pink circles) readily inhibit cell-free transfer of single 
virions, but may not completely stop cell-to-cell transfer of virions in target-rich areas. 


active cycles of infection than by preferential 
activation of cells that harbour latent virus. 

Sigal et al.’ provide a compelling and 
intuitive mechanism that might account for 
the failure of potent drugs to completely inhibit 
transfer of replication-competent virus. Using 
a complex mathematical model, they show that 
the drug concentration required to prevent a 
single transmitted virion from successfully 
infecting a target cell is much lower than that 
needed to stop multiple transmitted virus par- 
ticles from infecting the same cell. They then 
show in an in vitro system that physiologi- 
cally relevant drug concentrations can readily 
inhibit cell-free transmission — that is, trans- 
mission ofa free virus to a target cell — but not 
cell-to-cell transfer of the virus (Fig. 1). These 
modelling and experimental findings are 
consistent, because cell-to-cell transmission 
is known" to be associated with the transfer 
of multiple virions to the target cell. 

Sigal and co-workers further show that 
under conditions of maximal drug exposure, 
the replication ratio following cell-to-cell 
transfer was less than 1, yet greater than that 
predicted for complete suppression. This sug- 
gests that although replication may not fully 
account for HIV persistence, it is likely to bea 
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contributing factor. Finally, the authors argue 
through another mathematical model that 
low-level replication of the virus can replenish 
the virus reservoir even in the absence of obvi- 
ous evolution, because localized tissue-based 
chains of infection are independent, intermit- 
tent and unlikely to be linked temporally. This 
is yet another intuitive conclusion that is easy to 
place in the context of existing evidence, but like 
all the predictions in this paper it is exceedingly 
difficult to prove in vivo. 

I should emphasize that these observations’ 
are not definitive. An important matter for 
the field is to define whether cell-to-cell virus 
transmission does indeed occur during effec- 
tive therapy. Given the logistics of accessing 
lymphoid tissues and their T cells in humans, 
it may be necessary to develop a robust non- 
human primate model in which animals are 
treated for prolonged periods. Such work 
could complement the ongoing studies aimed 
at defining the size and distribution of the HIV 
reservoir in humans. 

One might argue that the existing anti- 
retroviral regimens will probably be effective 
as long asa patient harbours drug-susceptible 
virus and adheres to the correct dosage of the 
drugs. So why should we care whether there is 


a small amount of difficult-to-prove de novo 
infection of certain cells in lymphoid tissues? 

There is growing recognition that delivering 
antiretroviral drugs for life to everyone who 
might benefit from them is not going to be pos- 
sible. Despite the unprecedented global invest- 
ment in providing drugs, the number of newly 
diagnosed infections remains far greater than 
the number of people with access to therapy. As 
resources are likely to become more limited over 
time, we will not be able to treat our way out of 
this epidemic. And even if therapy is delivered, 
many people cannot fully adhere to it over time- 
scales of years to decades. These public-health 
and individual failures result in significant harm 
to the individuals (owing to untreated progres- 
sive HIV infection) and to the society (because 
untreated people are far more likely than treated 
people to transmit the virus to others"). 

The only way to fully address individual and 
public needs is to cure those infected with HIV. 
Given the existence of long-lived reservoirs of 
HIV, even complete inhibition of this virus’ rep- 
lication is unlikely to clear it completely from 
the body. But achieving complete inhibition is 
almost certainly going to be necessary for the 


ASTROPHYSICS 


many other curative strategies now being con- 
sidered’. Addressing the questions Sigal and 
colleagues’ raise in a more definitive in vivo 
experiment will be a challenge, but it is prob- 
ably among the most crucial tasks for those who 
conduct translational research in this area. = 
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A hint of normality 


at last? 


The chemical diversity of the oldest stars is greater than we thought. The 
discovery of an extremely iron-poor star with a ‘normal’ ratio of carbon to iron 
challenges our perception of early chemical enrichment. SEE LETTER P.67 


JOHN E. NORRIS 


he chemical abundances of the old- 

est stars in the Galaxy hold a key to 

what the Universe's conditions were 
like at the earliest times. Indeed, the three 
most chemically primitive stars currently 
known have abundance patterns extremely 
different from those formed after the Uni- 
verse’ first billion years — for example, their 
abundance ratio of carbon to iron is about 
10-1,000 times larger than is found at later 
times, indicating quite different conditions 
between the two epochs. On page 67 of this 
issue, Caffau et al.' report a chemically primi- 
tive star that is in some respects more ‘nor- 
mal’ than these objects, indicating a larger 
chemical diversity, which challenges our 
understanding of the first stars. 

Most astrophysicists agree that the Big Bang 
hypothesis provides the best description of the 
formation of the Universe. According to the 
standard version of this theory, a few minutes 
after the Universe began the only chemical 


elements were hydrogen, helium and lithium. 
At that time, some 13.7 billion years ago, 
their fractions by mass were 0.75, 0.25 and 
2.8x 10°, respectively””. Astronomers, on 
the other hand, have observed no stars devoid 
of elements more massive than lithium. The 
two most iron-poor stars, which are believed 
to have ages of approximately 13 billion years, 
have an observed iron abundance about 10°” 
that of the Sun — a small but well-determined 
amount. Further, the observed fractional 
lithium abundance in most stars that formed 
close in time to the beginning of the Universe, 
and in which the observed abundances should 
not have changed from their initial values, is 
8.3 x 10°", some three times smaller than 
predicted*. Finally, in the three most iron- 
poor stars (all with less than 10~*° solar iron 
abundance), carbon, nitrogen and oxygen are 
present in prodigious amounts relative to iron; 
and in one of them, the abundances of sodium, 
magnesium and aluminium relative to iron are 
at least 100 times those of the Sun. 

What do these observed abundances tell us? 
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One suggestion is that supernovae (the final 
explosions of massive stars, which produce 
essentially all of the elements heavier than 
lithium and enrich the gas clouds from which 
later stars form) were very different within the 
first few hundred million years from those that 
followed, leading to the very different abun- 
dance patterns. Another possibility is that 
large overabundances of carbon and/or oxygen 
may have had a crucial role in determining the 
nature of the first stars to form. 

A hint of normality has been restored to 
the field by Caffau and colleagues’ discovery’ 
of the extremely iron-poor dwarf star, SDSS 
J102915+172927, which has 10*° the iron 
abundance of the Sun. This value places it in 
the same range as the three most iron-poor 
stars mentioned above. Normality prevails in 
the sense that this star is not strongly carbon 
enhanced: the authors did not detect carbon 
in it, and the carbon-to-iron abundance ratio 
upper limit is not too different from the solar 
ratio. As Caffau et al. point out, the low carbon 
abundance in this object seems inconsistent 
with the prediction of Frebel et al. that large 
carbon and/or oxygen abundances is an essen- 
tial ingredient that provides cooling of the gas 
clouds from which the early low-mass, long- 
lived stars we observe today were formed. 

The question that begs to be answered is: 
what does ‘normal’ mean during the Uni- 
verse’ first few hundred million years? Given 
that the majority of these four most iron-poor 
stars is carbon-rich, should the carbon-rich 
stars not be considered normal and the ‘car- 
bon-normal’ object abnormal? More to the 
point, should one think in terms of two dif- 
ferent types of chemical-enrichment sources 
that produce the different observed chemical 
signatures — perhaps the ‘mixing and fallback 
type of supernova’ for the carbon-rich stars 
and the standard ‘core-collapse’ supernovae 
for the carbon-normal — or does one need 
something quite different? Another intrigu- 
ing question is whether the ‘carbon-normal’ 
SDSS J102915+172927 is more primitive than 
the three carbon-rich members of the most 
iron-poor stars discussed here. 

There will be considerable interest in the 
lithium abundance of this star, which is any- 
thing but normal: Caffau et al. were unable 
to detect lithium in the spectrum of SDSS 
J102915+172927 (despite the expectation that 
it should be readily detectable) and report a 
lithium mass fraction of less than 6.8 x 107"! 
— more than 40 times smaller than the pre- 
dictions of Big Bang nucleosynthesis, the 
process by which atomic nuclei were formed in 
the early Universe. Only one of the previously 
known three most iron-poor stars (HE 1327- 
2326) has an effective temperature at which 
stellar evolutionary effects are not expected 
to have greatly altered its original lithium 
abundance, and this star is also lithium defi- 
cient’, by a factor greater than 100. Thus, as 
far as we know, all four could have been born 
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a small amount of difficult-to-prove de novo 
infection of certain cells in lymphoid tissues? 

There is growing recognition that delivering 
antiretroviral drugs for life to everyone who 
might benefit from them is not going to be pos- 
sible. Despite the unprecedented global invest- 
ment in providing drugs, the number of newly 
diagnosed infections remains far greater than 
the number of people with access to therapy. As 
resources are likely to become more limited over 
time, we will not be able to treat our way out of 
this epidemic. And even if therapy is delivered, 
many people cannot fully adhere to it over time- 
scales of years to decades. These public-health 
and individual failures result in significant harm 
to the individuals (owing to untreated progres- 
sive HIV infection) and to the society (because 
untreated people are far more likely than treated 
people to transmit the virus to others"). 

The only way to fully address individual and 
public needs is to cure those infected with HIV. 
Given the existence of long-lived reservoirs of 
HIV, even complete inhibition of this virus’ rep- 
lication is unlikely to clear it completely from 
the body. But achieving complete inhibition is 
almost certainly going to be necessary for the 
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many other curative strategies now being con- 
sidered’. Addressing the questions Sigal and 
colleagues’ raise in a more definitive in vivo 
experiment will be a challenge, but it is prob- 
ably among the most crucial tasks for those who 
conduct translational research in this area. = 
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discovery of an extremely iron-poor star with a ‘normal’ ratio of carbon to iron 
challenges our perception of early chemical enrichment. SEE LETTER P.67 


JOHN E. NORRIS 


he chemical abundances of the old- 

est stars in the Galaxy hold a key to 

what the Universe's conditions were 
like at the earliest times. Indeed, the three 
most chemically primitive stars currently 
known have abundance patterns extremely 
different from those formed after the Uni- 
verse’ first billion years — for example, their 
abundance ratio of carbon to iron is about 
10-1,000 times larger than is found at later 
times, indicating quite different conditions 
between the two epochs. On page 67 of this 
issue, Caffau et al.' report a chemically primi- 
tive star that is in some respects more ‘nor- 
mal’ than these objects, indicating a larger 
chemical diversity, which challenges our 
understanding of the first stars. 

Most astrophysicists agree that the Big Bang 
hypothesis provides the best description of the 
formation of the Universe. According to the 
standard version of this theory, a few minutes 
after the Universe began the only chemical 


elements were hydrogen, helium and lithium. 
At that time, some 13.7 billion years ago, 
their fractions by mass were 0.75, 0.25 and 
2.8x 10°, respectively””. Astronomers, on 
the other hand, have observed no stars devoid 
of elements more massive than lithium. The 
two most iron-poor stars, which are believed 
to have ages of approximately 13 billion years, 
have an observed iron abundance about 10°” 
that of the Sun — a small but well-determined 
amount. Further, the observed fractional 
lithium abundance in most stars that formed 
close in time to the beginning of the Universe, 
and in which the observed abundances should 
not have changed from their initial values, is 
8.3 x 10°", some three times smaller than 
predicted*. Finally, in the three most iron- 
poor stars (all with less than 10~*° solar iron 
abundance), carbon, nitrogen and oxygen are 
present in prodigious amounts relative to iron; 
and in one of them, the abundances of sodium, 
magnesium and aluminium relative to iron are 
at least 100 times those of the Sun. 

What do these observed abundances tell us? 
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One suggestion is that supernovae (the final 
explosions of massive stars, which produce 
essentially all of the elements heavier than 
lithium and enrich the gas clouds from which 
later stars form) were very different within the 
first few hundred million years from those that 
followed, leading to the very different abun- 
dance patterns. Another possibility is that 
large overabundances of carbon and/or oxygen 
may have had a crucial role in determining the 
nature of the first stars to form. 

A hint of normality has been restored to 
the field by Caffau and colleagues’ discovery’ 
of the extremely iron-poor dwarf star, SDSS 
J102915+172927, which has 10*° the iron 
abundance of the Sun. This value places it in 
the same range as the three most iron-poor 
stars mentioned above. Normality prevails in 
the sense that this star is not strongly carbon 
enhanced: the authors did not detect carbon 
in it, and the carbon-to-iron abundance ratio 
upper limit is not too different from the solar 
ratio. As Caffau et al. point out, the low carbon 
abundance in this object seems inconsistent 
with the prediction of Frebel et al. that large 
carbon and/or oxygen abundances is an essen- 
tial ingredient that provides cooling of the gas 
clouds from which the early low-mass, long- 
lived stars we observe today were formed. 

The question that begs to be answered is: 
what does ‘normal’ mean during the Uni- 
verse’ first few hundred million years? Given 
that the majority of these four most iron-poor 
stars is carbon-rich, should the carbon-rich 
stars not be considered normal and the ‘car- 
bon-normal’ object abnormal? More to the 
point, should one think in terms of two dif- 
ferent types of chemical-enrichment sources 
that produce the different observed chemical 
signatures — perhaps the ‘mixing and fallback 
type of supernova’ for the carbon-rich stars 
and the standard ‘core-collapse’ supernovae 
for the carbon-normal — or does one need 
something quite different? Another intrigu- 
ing question is whether the ‘carbon-normal’ 
SDSS J102915+172927 is more primitive than 
the three carbon-rich members of the most 
iron-poor stars discussed here. 

There will be considerable interest in the 
lithium abundance of this star, which is any- 
thing but normal: Caffau et al. were unable 
to detect lithium in the spectrum of SDSS 
J102915+172927 (despite the expectation that 
it should be readily detectable) and report a 
lithium mass fraction of less than 6.8 x 107"! 
— more than 40 times smaller than the pre- 
dictions of Big Bang nucleosynthesis, the 
process by which atomic nuclei were formed in 
the early Universe. Only one of the previously 
known three most iron-poor stars (HE 1327- 
2326) has an effective temperature at which 
stellar evolutionary effects are not expected 
to have greatly altered its original lithium 
abundance, and this star is also lithium defi- 
cient’, by a factor greater than 100. Thus, as 
far as we know, all four could have been born 
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lithium-poor. Although lithium-deficient stars 
are not unknown in the Galaxy’s halo, which 
contains the oldest and more iron-poor stars in 
the system and in which the lithium deficiency 
is driven perhaps by their being members of 
binary systems*, they comprise only about 
5% of the Galaxy’s halo stars. The absence of 
lithium in the most iron-poor stars discussed 
here is an exciting and potentially fundamen- 
tal result. What has happened to the lithium 
created at the birth of the Universe? 

The caveat to the above discussion is, of 
course, the small number of currently known 
iron-poor stars that have less than 10°*” the 


solar iron abundance. Caffau et al.' comment 
that they expect 5-50 stars of similar (or lower) 
iron content to that of SDSS J102915+172927 
to be found in the Sloan Digital Sky Survey, 
in which they discovered this star. If they, and 
other currently planned surveys for the most 
metal-poor stars, are successful, the long- 
standing tyranny of small numbers will indeed 
have been overcome. m 
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Nitrogen from the deep 


Ecosystems acquire nitrogen from the atmosphere, but this source can’t account 
for the large nitrogen capital of some systems. The finding that bedrock can also 
act as a nitrogen source may help solve the riddle. SEE LETTER P.78 


EDWARD A. G. SCHUUR 


plants grow are controlled by the availability 

of nitrogen, an essential element for all life’. 
Forest growth is fuelled by the nitrogen capi- 
tal contained in soils and biomass. Like money 
ina bank account, this capital increases slowly 
within ecosystems over hundreds to thou- 
sands of years from the accumulation of tiny 
deposits of nitrogen that arrive each year from 
the atmosphere’. But on page 78 of this issue, 
Morford et al.’ report that nitrogen-rich bed- 
rock can double nitrogen input rates to forest 
ecosystems, which flourish as a result. 

Nitrogen is the fourth most 
abundant element in living 
organisms, and is used as a 
building block for critically 
important biological molecules 
such as amino acids and nucleic 
acids. In many ecosystems 
worldwide, nitrogen is the ele- 
ment whose supply rate from 
the environment is most lim- 
ited. Because competition is 
fierce for this resource, nitrogen 
supply controls the behaviour of 
many organisms and shapes the 
structure and function of whole 
ecosystems. 

Most of the nitrogen needed 
by organisms to grow is supplied 
by recycling, in which decom- 
posing organic matter releases 
nitrogen in forms that can be 
acquired by plants and micro- 
organisms. Recycling, in turn, 
is dependent on the nitrogen 


lE many parts of the world, the rates at which 


capital that has accumulated over time in an 
ecosystem. New inorganic nitrogen is depos- 
ited into ecosystems abiotically in rainfall, or 
with the assistance of certain microbes that, 
individually or in close relationships with 
plants or fungi, convert inert atmospheric 
nitrogen gas into a form that organisms can 
use. 

The new deposits of nitrogen are small rela- 
tive to the quantities of the element that are 
recycled. But they are vital, not only as a source 
of nitrogen for newly forming ecosystems, but 
also because they sustain growth over centu- 
ries of ecosystem development by balancing 
natural nitrogen loss out of ecosystems into 


Figure 1 | Flourishing forests. Morford et al.’ report that conifer forests at South 
Fork Mountain, California, are enriched in nitrogen supplied by the underlying 

sedimentary rock. This nitrogen boost increases the above-ground biomass of the 
forest. 
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streams or back to the atmosphere. Scientists 
have made detailed measurements of atmos- 
pheric nitrogen inputs in many places, but 
have sometimes encountered a puzzling phe- 
nomenon: the nitrogen capital within some 
ecosystems is larger than can be accounted for 
by known atmospheric sources’. 

Morford and colleagues’ discovery’ that 
bedrock can provide substantial quantities of 
nitrogen to organisms provides a new piece 
of the puzzle, and in doing so helps reshape 
our view of ecosystem nitrogen budgets. The 
authors compared plants and soils from for- 
est ecosystems in California that are similar in 
terms of their stand age, climate and soil type, 
but which grow on two different types of bed- 
rock. They found that tree species common to 
both sites were enriched in nitrogen in forests 
growing on soils derived from mica-schist (a 
type of marine sedimentary rock) compared 
with those in forests growing on soils derived 
from gabbro-diorite (a type of igneous rock). 
Tellingly, the sedimentary rock contained 
roughly ten times the levels of nitrogen found 
in the igneous rock. 

The authors found that the trees growing on 
the sedimentary rock not only 
had higher nitrogen levels, but 
also had more leaves than did 
trees growing on the igneous rock 
(Fig. 1). This presumably enables 
them to grow faster and results in 
amore productive forest. The ele- 
vated nitrogen levels in these trees 
corresponded to the measured 
nitrogen capital of the underly- 
ing soils, which are the direct 
source of the nitrogen to the 
forest — that is, the nitrogen capi- 
tal in these soils was twice that 
of conifer-forest soils overlying 
igneous rock. 

Nitrogen levels vary across for- 
est stands for many reasons, and 
so Morford et al. needed more 
evidence to show that bedrock 
was responsible for the observed 
variations. The authors there- 
fore took advantage of the fact 
that plants, soils and bedrock all 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 39 


© 2011 Macmillan Publishers Limited. All rights reserved 


S. MORFORD 


lithium-poor. Although lithium-deficient stars 
are not unknown in the Galaxy’s halo, which 
contains the oldest and more iron-poor stars in 
the system and in which the lithium deficiency 
is driven perhaps by their being members of 
binary systems*, they comprise only about 
5% of the Galaxy’s halo stars. The absence of 
lithium in the most iron-poor stars discussed 
here is an exciting and potentially fundamen- 
tal result. What has happened to the lithium 
created at the birth of the Universe? 

The caveat to the above discussion is, of 
course, the small number of currently known 
iron-poor stars that have less than 10°*” the 


solar iron abundance. Caffau et al.' comment 
that they expect 5-50 stars of similar (or lower) 
iron content to that of SDSS J102915+172927 
to be found in the Sloan Digital Sky Survey, 
in which they discovered this star. If they, and 
other currently planned surveys for the most 
metal-poor stars, are successful, the long- 
standing tyranny of small numbers will indeed 
have been overcome. m 
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Ecosystems acquire nitrogen from the atmosphere, but this source can’t account 
for the large nitrogen capital of some systems. The finding that bedrock can also 
act as a nitrogen source may help solve the riddle. SEE LETTER P.78 
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plants grow are controlled by the availability 

of nitrogen, an essential element for all life’. 
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plants or fungi, convert inert atmospheric 
nitrogen gas into a form that organisms can 
use. 
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tive to the quantities of the element that are 
recycled. But they are vital, not only as a source 
of nitrogen for newly forming ecosystems, but 
also because they sustain growth over centu- 
ries of ecosystem development by balancing 
natural nitrogen loss out of ecosystems into 
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Fork Mountain, California, are enriched in nitrogen supplied by the underlying 
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pheric nitrogen inputs in many places, but 
have sometimes encountered a puzzling phe- 
nomenon: the nitrogen capital within some 
ecosystems is larger than can be accounted for 
by known atmospheric sources’. 

Morford and colleagues’ discovery’ that 
bedrock can provide substantial quantities of 
nitrogen to organisms provides a new piece 
of the puzzle, and in doing so helps reshape 
our view of ecosystem nitrogen budgets. The 
authors compared plants and soils from for- 
est ecosystems in California that are similar in 
terms of their stand age, climate and soil type, 
but which grow on two different types of bed- 
rock. They found that tree species common to 
both sites were enriched in nitrogen in forests 
growing on soils derived from mica-schist (a 
type of marine sedimentary rock) compared 
with those in forests growing on soils derived 
from gabbro-diorite (a type of igneous rock). 
Tellingly, the sedimentary rock contained 
roughly ten times the levels of nitrogen found 
in the igneous rock. 

The authors found that the trees growing on 
the sedimentary rock not only 
had higher nitrogen levels, but 
also had more leaves than did 
trees growing on the igneous rock 
(Fig. 1). This presumably enables 
them to grow faster and results in 
amore productive forest. The ele- 
vated nitrogen levels in these trees 
corresponded to the measured 
nitrogen capital of the underly- 
ing soils, which are the direct 
source of the nitrogen to the 
forest — that is, the nitrogen capi- 
tal in these soils was twice that 
of conifer-forest soils overlying 
igneous rock. 

Nitrogen levels vary across for- 
est stands for many reasons, and 
so Morford et al. needed more 
evidence to show that bedrock 
was responsible for the observed 
variations. The authors there- 
fore took advantage of the fact 
that plants, soils and bedrock all 
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contain measurably different amounts of '"N 
in their nitrogen pools. They found that, in the 
forest growing on nitrogen-rich sedimentary 
rock, the '"N-content in both plants and soils 
matched that of the bedrock; this was not true 
for forests growing on the nitrogen-poor igne- 
ous rock, ruling out the possibility of significant 
nitrogen contribution from this rock. 
Although the nitrogen-isotope measure- 
ments helped build the case for sedimentary 
bedrock as a nitrogen source for forests, they 
alone were not a smoking gun. To extend the 
findings beyond the carefully matched for- 
est stands, the authors carried out a regional 
analysis of similar conifer forests in California. 
Sure enough, they found that the above-ground 
biomass of forests growing on nitrogen-rich 
sedimentary bedrock was almost 50% bigger by 
mass than that of forests on igneous bedrock, 


after accounting for differing ages of tree stands. 

The ‘imprint’ of nitrogen from bedrock 
on streams’ and soils® has previously been 
reported for isolated sites in the same general 
region as the current study’, and so Morford 
and colleagues’ analysis makes the case for 
this as a regional pattern. But less than 2% of 
conifer-forest soils in that same region have a 
nitrogen capital as high as the sedimentary- 
bedrock forest that has been intensively studied 
by the authors (see Supplementary Information 
for ref. 3). This means that the high input of 
nitrogen from bedrock beneath that forest — 
which is equivalent to atmospheric nitrogen 
inputs — probably represents an upper estimate 
for the extent of this phenomenon. With 75% 
of Earth covered by sedimentary and related 
rock types’, there is a real need to explore the 
phenomenon beyond this region to determine 


Tumour-fighting virus 


homes in 


An early clinical trial demonstrates the delivery and replication of a cancer- 
killing virus in metastasized tumour tissue. These promising results could provide 
a foundation for systemic virotherapy for patients with cancer. SEE LETTER P.99 


EVANTHIA GALANIS 


linical advances in cancer research 

are often slow to materialize, in part 

because the efficacy of a treatment has 
to be balanced against its potential toxicity 
to normal tissues. Infection of tumours with 
oncolytic (cancer-killing) viruses has been 
explored as a new type of treatment that is not 
cross-resistant with approved cancer therapies 
and, being target-specific, may have fewer 
toxic side effects. On page 99 of this issue, 
Breitbach et al.’ describe a phase I clinical trial 
in which an intravenously delivered oncolytic 
poxvirus was capable of replicating selectively 
in metastasized tumours. This is a milestone in 
the development of an effective oncolytic agent 
for systemic administration. 

Oncolytic viruses became a focus of atten- 
tion for cancer therapy following observations 
that natural viral infection or vaccination can 
lead to spontaneous regression of malignan- 
cies’. Unhindered by interferon-mediated anti- 
viral defence, which is compromised in many 
tumours’, these viruses specifically attack 
cancer cells by gaining entry through receptors 
that are overexpressed in these cells and/or by 
exploiting molecular pathways associated with 
malignant transformation for their replica- 
tion*”. As the virus starts to replicate at the 
tumour site, its destructive effect increases. 


Strategies are being devised to make this 
process even more efficient by deploying 
genetically engineered oncolytic viruses that 
carry therapeutic or immunomodulatory 
transgenes. 

In advanced cancer, systemic dissemination 
of solid tumours is linked with a poor progno- 
sis. Before oncolytic viruses can be used to treat 
such metastases, they must be able to reach and 
replicate in metastatic sites following intrave- 
nous administration. But there are obstacles to 
be overcome, including the antiviral immune 
response, and the uptake and destruction of 
the virus by the endothelial reticulum system 
in the liver and spleen. 

Breitbach et al.’ take up the challenge using 
a genetically engineered oncolytic poxvirus 
known as JX-594. This is a smallpox-vaccine 
derivative of Wyeth-strain vaccinia virus car- 
rying an inactivated thymidine kinase gene to 
increase tumour specificity, and expressing 
two transgenes: one encoding human granu- 
locyte-macrophage colony-stimulating factor 
(GM-CSF) to stimulate anti-tumour immu- 
nity and the other B-galactosidase, a surrogate 
marker for detecting viral gene expression. 

The authors tracked the virus in 23 cancer 
patients, all with advanced solid tumours that 
were resistant to other treatments. Patients 
were each given one dose of JX-594 at one 
of six different dosage levels by intravenous 
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injection; these were all well tolerated. The 
maximum feasible dose was 3 x 10’ plaque- 
forming units (PFU) per kilogram of body 
weight (corresponding to a total dose of about 
2x 10° PFU). This dosage is in line with doses 
of other oncolytic viruses that can safely be 
given intravenously, including adenovirus, 
reovirus, paramyxovirus (Newcastle disease 
virus and measles) and Seneca Valley virus. 

Breitbach et al. demonstrated such dose- 
dependent delivery of the virus (at 8-10 days 
after intravenous administration) to metastatic 
tumour deposits from a variety of tumour types, 
including leiomyosarcoma, mesothelioma, and 
lung, ovarian and colorectal cancers. In eight 
patients who had received 10” PFU or more per 
dose, delivery and replication were confirmed 
by quantitative polymerase chain reaction in 
five patients and by immunohistochemistry 
using a polyclonal anti-vaccinia antibody in six 
patients: granular cytoplasmic staining evident 
in tumour tissue was indicative of replicating 
virus (viral factories; Fig. 1). 

Although JX-594 administration seemed to 
result in disease control in a dose-dependent 
way, with patients treated with the higher 
doses benefitting the most, viral infection and 
replication in metastatic deposits did not 
consistently affect clinical outcome. Some 
patients experienced clinical benefit — 
defined as disease stabilization for more than 
ten weeks — even when there was no evidence 
of viral replication in their tumour biopsies. 
By contrast, two out of six patients who were 
JX-594-positive by immunohistochemistry 
had progressive disease at first evaluation, 
even though replicating virus was detected in 
their metastatic tumours. 

The explanation for these discrepancies 
may be down to several factors. For example, 
patients were allowed only one viral dose and 
treatment cycle: as with other cancer therapies, 
it is unlikely that a single round of treatment 
would be enough to stop tumour growth. 
Sampling variability in patients, whether 
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Figure 1 | Common oncogenic mutations in cancer cells encourage replication of the genetically 
engineered oncolytic JX-594 virus’. The virus takes advantage of a cancer cell’s uncontrolled epidermal 
growth factor receptor (EGFR)-RAS signalling pathway. To replicate, this thymidine kinase (TK)- 
deficient virus relies on expression of TK by cancer cells. The newly assembled viruses then leave the 

cell to infect other tumour cells. These viruses also secrete GM-CSF a factor that stimulates anti-tumour 
immunity. In normal cells, however, viral replication is blocked because this virus cannot efficiently 


exploit the cell’s replication machinery. 


positive or negative for JX-594, may also have 
confounded the results. Reassuringly, the 
normal tissue of patients in whom replication 
was detected was negative for replication by 
immunohistochemistry. 

The limitations notwithstanding, these 
results convincingly demonstrate success- 
ful dose-dependent delivery and replication 
of an oncolytic virus in metastatic disease 
sites, following intravenous administration in 
patients with primary solid tumours. Although 
oncolytic viral replication in metastatic 
disease sites after systemic administration 
has been reported before, those studies are 
undermined by detectable replication only in 
isolated patients or by methodology unable to 
distinguish properly between input and prog- 
eny virus. Promising preclinical data, how- 
ever, point to several strategies for enhancing 
systemic delivery of oncolytic viruses, includ- 
ing the use of cell carriers, cationic liposomes 
and polymers. 

Large randomized trials to test oncolytic 
viruses in cancer treatment are ongoing or 
soon to be activated. These will investigate 
the potential synergistic cytotoxicity between 
oncolytic viruses and more conventional thera- 
peutic approaches such as chemotherapy, small- 
molecule cell-cycle inhibitors, radiation therapy 
and anti-angiogenesis agents®”. In addition, 
they will exploit induction of a systemic anti- 
tumour immune response in association with 
oncolytic tumour-cell death and expression of 
immunomodulatory transgenes”. 

Examples of such trials include the 


soon-to-be-completed phase IT trial of an 
attenuated strain of herpes simplex virus-1 
that encodes GM-CSF in patients with 


Blood ties 
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metastatic melanoma; the recently activated 
phase II] trial testing addition of reovirus to 
paclitaxel/carboplatin chemotherapy in 
patients with recurrent head and neck cancer; 
and a randomized phase II trial comparing 
JX-594 with the best supportive care in patients 
with hepatocellular carcinoma for whom treat- 
ment with the drug sorafenib has failed. 

In contrast to Asian countries, no viro- 
therapy agent has so far been approved in the 
United States or Europe. The outcome of these 
trials may change this, generating additional 
valuable clinical tools for oncologists. m 
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The brain’s ability to generate new neurons declines with age. This reduction 
is mediated by increased levels of an inflammatory factor in the blood of ageing 
mice and is associated with deficits in learning and memory. SEE LETTER P.90 


RICHARD M. RANSOHOFF 


n the face of it, the production of 

new neurons in the adult mamma- 

lian brain’” sounds like a good thing. 
Interventions that reduce neurogenesis in 
adulthood can be associated with impaired 
brain function (in particular, with deficits in 
learning and memory), and the formation of 
neurons from neural stem cells declines with 
age. Understanding neurogenesis is therefore a 
major research goal, and neural stem cells are a 
tantalizing target for attempts to treat damaged 
brains by stimulating the production of neurons 
and other brain cells. On page 90 of this issue, 
Villeda and colleagues’ report a crucial advance 
in this direction by identifying a blood-borne 
factor that affects neurogenesis and cognitive 
function in ageing mice. 


With age, not only might the activity of 
neural stem cells (NSCs) deteriorate, but their 
immediate environment (the neurogenic 
niche) might also become compromised. The 
NSC niches lie near blood vessels, and factors 
that alter neurogenesis, such as exercise or 
systemic inflammation*”, might act by modi- 
fying blood cells or the abundance of signal- 
ling proteins in the blood plasma. Villeda 
et al. proposed, therefore, that agents present in 
the blood might affect neurogenesis. 

To test this possibility, the authors’ used a 
surgical procedure called parabiosis to connect 
the flank tissues of pairs of mice so that the ani- 
mals developed a shared circulation. In mouse 
pairs of the same age (young—young or old-old), 
parabiosis alone did not affect neurogenesis. In 
the old-young pairs, however, the older animal 
showed enhanced neurogenesis, and in younger 
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Figure 1 | Common oncogenic mutations in cancer cells encourage replication of the genetically 
engineered oncolytic JX-594 virus’. The virus takes advantage of a cancer cell’s uncontrolled epidermal 
growth factor receptor (EGFR)-RAS signalling pathway. To replicate, this thymidine kinase (TK)- 
deficient virus relies on expression of TK by cancer cells. The newly assembled viruses then leave the 

cell to infect other tumour cells. These viruses also secrete GM-CSF a factor that stimulates anti-tumour 
immunity. In normal cells, however, viral replication is blocked because this virus cannot efficiently 


exploit the cell’s replication machinery. 


positive or negative for JX-594, may also have 
confounded the results. Reassuringly, the 
normal tissue of patients in whom replication 
was detected was negative for replication by 
immunohistochemistry. 

The limitations notwithstanding, these 
results convincingly demonstrate success- 
ful dose-dependent delivery and replication 
of an oncolytic virus in metastatic disease 
sites, following intravenous administration in 
patients with primary solid tumours. Although 
oncolytic viral replication in metastatic 
disease sites after systemic administration 
has been reported before, those studies are 
undermined by detectable replication only in 
isolated patients or by methodology unable to 
distinguish properly between input and prog- 
eny virus. Promising preclinical data, how- 
ever, point to several strategies for enhancing 
systemic delivery of oncolytic viruses, includ- 
ing the use of cell carriers, cationic liposomes 
and polymers. 

Large randomized trials to test oncolytic 
viruses in cancer treatment are ongoing or 
soon to be activated. These will investigate 
the potential synergistic cytotoxicity between 
oncolytic viruses and more conventional thera- 
peutic approaches such as chemotherapy, small- 
molecule cell-cycle inhibitors, radiation therapy 
and anti-angiogenesis agents®”. In addition, 
they will exploit induction of a systemic anti- 
tumour immune response in association with 
oncolytic tumour-cell death and expression of 
immunomodulatory transgenes”. 

Examples of such trials include the 


soon-to-be-completed phase IT trial of an 
attenuated strain of herpes simplex virus-1 
that encodes GM-CSF in patients with 
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metastatic melanoma; the recently activated 
phase II] trial testing addition of reovirus to 
paclitaxel/carboplatin chemotherapy in 
patients with recurrent head and neck cancer; 
and a randomized phase II trial comparing 
JX-594 with the best supportive care in patients 
with hepatocellular carcinoma for whom treat- 
ment with the drug sorafenib has failed. 

In contrast to Asian countries, no viro- 
therapy agent has so far been approved in the 
United States or Europe. The outcome of these 
trials may change this, generating additional 
valuable clinical tools for oncologists. m 
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With age, not only might the activity of 
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immediate environment (the neurogenic 
niche) might also become compromised. The 
NSC niches lie near blood vessels, and factors 
that alter neurogenesis, such as exercise or 
systemic inflammation*”, might act by modi- 
fying blood cells or the abundance of signal- 
ling proteins in the blood plasma. Villeda 
et al. proposed, therefore, that agents present in 
the blood might affect neurogenesis. 

To test this possibility, the authors’ used a 
surgical procedure called parabiosis to connect 
the flank tissues of pairs of mice so that the ani- 
mals developed a shared circulation. In mouse 
pairs of the same age (young—young or old-old), 
parabiosis alone did not affect neurogenesis. In 
the old-young pairs, however, the older animal 
showed enhanced neurogenesis, and in younger 
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mice neurogenesis was reduced. Also, learning 
and memory were impaired in young mice after 
they received injections of plasma from old, but 
not young, mice. 

What are the factors in the blood that 
differ between young and old mice and so affect 
neurogenesis? Villeda and co-workers ruled 
out direct effects of cells migrating from old to 
young mice in the parabiotic pairs, because cells 
from the older mice cannot enter the brain of 
young mice. They therefore compared a subset 
of plasma proteins — 66 to be exact — between 
young and old mice, evaluating young-old 
parabiotic pairs to see which proteins were more 
abundant in the older animals but reduced in 
the younger mice. They fixed on an unlikely 
culprit for altering neurogenesis, the chemokine 
protein CCL11. 

Chemokines constitute a genetically and 
structurally coherent group of immune medi- 
ator molecules called cytokines. But although 
they were originally discovered through their 
ability to direct the migration of inflamma- 
tory white blood cells, chemokines are now 
recognized as being regulatory factors in the 
development of tissues as diverse as the central 
nervous system and the urogenital system’. 
One chemokine, CXCL12, and its receptors 
CXCR4 and CXCR7, have been accorded 
pride of place in neurogenic-niche physiol- 
ogy because of their well-known roles in the 
development and function of the central nerv- 
ous system’. CCLI11, by contrast, has been 
mainly linked to allergic conditions such as 
asthma. So, do the levels of this chemokine 
simply correlate with an anti-neurogenic envi- 
ronment, or could it be that CCL11 actively 
affects neurogenesis? 

Villeda et al.’ provide several lines of 
evidence suggesting the latter possibility. 
When the authors injected this chemokine 
systemically into young animals, neurogen- 
esis was reduced. Moreover, antibodies that 
neutralize CCL11, when co-injected with the 
chemokine systemically or into the neurogenic 
niche itself, reversed this decline in neurogen- 
esis. Furthermore, brain slices from mice given 
CCLI11 injections showed reduced long-term 
potentiation (LTP), a neurophysiological 
correlate of learning. 

Exactly how CCL11 affects neurogen- 
esis and cognitive function remains unclear. 
In vivo, CCR3, the receptor through which 
CCLI11 signals, has not been reproducibly 
identified on NSCs or on the neural progenitor 
cells that arise from them; so it is likely that an 
indirect mechanism is at play. Previous work 
suggested that CCL11 could affect neurogen- 
esis through several pathways. For instance, its 
receptor could be present on microglia,” the 
brain cells that can produce cytokines and that, 
under some conditions, impair neurogenesis’. 
Also, exposure of NSC-containing mixed-cell 
cultures to CCL11 leads to decreased prolif- 
eration of NSCs’. However, the CCL11/CCR3 
signals may not always be deleterious: mice 


lacking CCR3 show greater neuronal loss than 
those with the receptor when the peripheral 
segment of their facial nerve is severed”. 

It is possible that CCL11 modifies the 
action of another cytokine. Myeloid cells in 
the meninges membranes lining the brain 
are maintained in a state of restrained inflam- 
mation through the action of the regulatory 
cytokine interleukin-4 (ref. 14); this state of 
muted inflammation promotes learning and 
memory. CCL11 suppresses the ability of 
interleukin-4 to restrain the inflammatory 
functions of myeloid cells'* and might thus 
reduce neurogenesis, causing memory and 
learning deficits. 

But regardless of the mechanisms involved, 
the good news from this report’ is that NSCs 
in the ageing brain do not undergo irreversible 
decline and can respond to a favourable envi- 
ronment, which includes the circulation. The 
precise link between NSCs, neurogenesis and 
blood cells is probably complex, involving more 
than just CCL11. For example, an earlier study’® 
found that individual mice from a genetically 
heterogeneous stock show widely variable 
levels of neurogenesis, and that the degree of 
neurogenesis is strongly correlated with the 
ratios of two subsets of T cells in the blood — 
those expressing the CD4 or CD8 marker pro- 
teins — but is weakly correlated with a variety 
of behavioural tasks. What's more, altering the 
ratios of CD4- and CD8-bearing T cells modu- 
lates neurogenesis, rather than vice versa. 

So it seems that there is a much more robust 


MATERIALS SCIENCE 


connection than previously suspected between 
the sites of neurogenesis in the adult brain and 
the systemic circulating pool of cells and pro- 
teins. Given the difficulty of manipulating 
neurogenic niches directly, this information 
is encouraging and should inspire increased 
activity in both joggers and neuroscientists. m 
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Dry solution to a sticky 


problem 


Sticking plasters revolutionized the protection of minor wounds, but they’ re not 
ideal for fragile skin. A material that mimics the adhesive properties of certain 


beetles’ feet might provide a solution. 


JEFFREY M. KARP & ROBERT LANGER 


dhesives that stick to skin for long 

periods of time, or over multiple cycles 

of use, are vital for medical applications. 
Such materials have to conform to stringent 
standards — for example, they must maintain 
robust adhesion during repeated application 
and removal without irritating the skin, and 
be non-toxic. Writing in Advanced Materials, 
Kwak et al.' report an exciting advance towards 
achieving these standards: an adhesive tape that 
uses micrometre-scale pillars on its surface to 
stick to skin. This innovation bypasses the need 
for a glue-coated surface, as is commonly used 
in conventional skin adhesives. 
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Skin adhesives are currently used billions 
of times a year — for example, in over-the- 
counter sticking plasters for the treatment of 
minor skin wounds, in transdermal patches 
for controlled drug delivery~’, and in tapes for 
affixing tubes or sensors to the skin in hospi- 
tals. Despite the remarkable success of these 
materials, a remaining challenge is to find 
adhesives suitable for use on the delicate skin 
of newborn infants and the elderly. Aged skin 
is particularly fragile, making it more suscep- 
tible to inflammation and damage’. Given that 
the number of people aged over 60 will double 
during the next two to three decades’, the need 
for skin adhesives for the elderly is becoming 
increasingly pressing. 
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and memory were impaired in young mice after 
they received injections of plasma from old, but 
not young, mice. 

What are the factors in the blood that 
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out direct effects of cells migrating from old to 
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correlate of learning. 
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mation through the action of the regulatory 
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interleukin-4 to restrain the inflammatory 
functions of myeloid cells'* and might thus 
reduce neurogenesis, causing memory and 
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But regardless of the mechanisms involved, 
the good news from this report’ is that NSCs 
in the ageing brain do not undergo irreversible 
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ronment, which includes the circulation. The 
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Skin adhesives are currently used billions 
of times a year — for example, in over-the- 
counter sticking plasters for the treatment of 
minor skin wounds, in transdermal patches 
for controlled drug delivery~’, and in tapes for 
affixing tubes or sensors to the skin in hospi- 
tals. Despite the remarkable success of these 
materials, a remaining challenge is to find 
adhesives suitable for use on the delicate skin 
of newborn infants and the elderly. Aged skin 
is particularly fragile, making it more suscep- 
tible to inflammation and damage’. Given that 
the number of people aged over 60 will double 
during the next two to three decades’, the need 
for skin adhesives for the elderly is becoming 
increasingly pressing. 


Pressure-sensitive surgical tapes first 
appeared in 1845, when the surgeon Horace 
Day applied rubber adhesive to strips of fab- 
ric’. For several years thereafter, minor cuts 
were treated with separate gauze and adhe- 
sive tape, but custom tailoring of the materials 
was required for domestic use. The first inte- 
grated skin-adhesive device — the Band-Aid 
—was invented in 1920 by Earle Dickson’, an 
employee at the company Johnson & Johnson. 
Dickson noticed that gauze and adhesive tape 
did not remain attached to his wife's fingers, 
which she frequently injured in the kitchen. He 
therefore placed gauze in the centre ofa strip of 
tape and covered the adhesive and gauze with a 
layer of crinoline to maintain its tack and steril- 
ity. Johnson & Johnson began mass-producing 
these sticking plasters shortly thereafter, and 
today it is estimated that more than 100 billion 
of Dickson's Band- Aids have been made’. 

The adhesives currently used for sticking 
plasters are polymeric, pressure-sensitive 
adhesives based on acrylic compounds’. 
Although effective, acrylic adhesives can leave 
behind sticky residues, and they lose their 
grip after repeated use. To bypass the need 
for these glues, researchers have focused on 
adhesion mechanisms used by animals such 
as beetles and geckos, whose feet stick to walls 
without any glue. The mechanism of gecko- 
foot adhesion was elucidated’ in 2000, nearly 
two millennia after Aristotle first reported the 
phenomenon: each gecko foot contains up 
to 500,000 hairs, each tipped with hundreds 
of projections known as spatulae. Similarly, 
the feet of beetles in the Chrysomelidae fam- 
ily are covered with tiny mushroom-shaped 
structures that help them cling to surfaces. 

Gecko spatulae are roughly hundreds of 
nanometres in length, whereas the mush- 
room-shaped structures of Chrysomelidae 
beetles’ feet are on the micrometre scale. It is 
possible to mimic these adhesive structures 
using nanometre- or micrometre-scale engi- 
neering to modify the surfaces of materials. 
Synthetic gecko-inspired adhesives have been 
made, but it has been difficult to optimize their 
properties for successful adhesion to wet tis- 
sues (such as those found inside the body). 
To solve this problem, we have previously 
used a hybrid approach, whereby a rubbery 
polymeric substrate with the surface nano- 
topography of gecko feet was coated with a thin 
layer of tissue-reactive glue’. The resulting 
material maximized adhesion to wet tissue 
while minimizing tissue inflammation. 

Kwak et al.' have focused on achieving 
adhesion to dry skin in the absence of glue. 
They patterned the surface of a rubbery, 
non-toxic substrate with micrometre-scale, 
mushroom-shaped projections (Fig. 1) — a 
topology reported to be ideal for maximizing 
adhesion! — varying the dimensions of the 
projections until they achieved optimal adhe- 
sion to human skin in a direction perpendicu- 
lar to its surface. Remarkably, the substrate 
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Figure 1 | Glueless. a, Kwak et al.' have made a polymer material that has a surface covered in 
micrometre-scale, mushroom-shaped projections (upper inset). The projections mimic those found on 
certain beetles’ feet, and allow the substrate to adhere to human skin without the use of glue (lower inset). 
b, c, Commonly used sticking plasters (b) use an acrylic adhesive to stick to skin, but can leave behind a 
sticky residue and cause redness (c). d, e, A patch made from Kwak and colleagues’ material (d) reduces 
these effects (e). Scale bars, 1 cm. (Graphic adapted from ref. 1. Photos reproduced from ref. 1.) 


maintained good adhesion through up to 
30 cycles of attachment and removal, without 
causing significant damage to skin. 

To demonstrate the functional utility of their 
adhesive, the authors integrated it into a wear- 
able diagnostic device that monitors the heart 
using electrocardiography. When attached to 
a patient's chest, the device recorded several 
vital signals from the heart in real time over a 
period of two days. For commercial applica- 
tions, however, Kwak and colleagues’ material 
will probably require higher levels of adhesion 
— the reported system’ achieved about 43% of 
the adhesion ofa moderately sticky acrylic. 

The authors’ work is part of a growing body 
of research aimed at finding new materials 
that form interfaces with tissue. For exam- 
ple, another recent paper” describes single- 
use ultrathin membranes that adhere to skin 
using only van der Waals interactions, and 
which incorporate electronic components that 
can be used to perform electrophysiological 
recordings. For long-term applications, these 
technologies should be tested both in the 
presence of humidity or perspiration and to 
see how they cope with the shedding of dead 
cells. New approaches may be required to 
address such issues, perhaps involving surface- 
responsive materials’’. Nevertheless, there 
is every hope that innovations such as 
that of Kwak et al.' will one day bring new 
technologies to the bedside. = 
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Synthesis, assembly and applications of 
semiconductor nanomembranes 


J. A. Rogers’, M. G. Lagally? & R. G. Nuzzo? 


Research in electronic nanomaterials, historically dominated by studies of nanocrystals/fullerenes and nanowires/ 
nanotubes, now incorporates a growing focus on sheets with nanoscale thicknesses, referred to as nanomembranes. 
Such materials have practical appeal because their two-dimensional geometries facilitate integration into devices, with 
realistic pathways to manufacturing. Recent advances in synthesis provide access to nanomembranes with extraordinary 
properties in a variety of configurations, some of which exploit quantum and other size-dependent effects. This progress, 
together with emerging methods for deterministic assembly, leads to compelling opportunities for research, from basic 
studies of two-dimensional physics to the development of applications of heterogeneous electronics. 


monocrystalline structures with thick- | 
nesses of less than a few hundred nano- |. 
metres and with minimum lateral dimensions 
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devices have been reported, each with a unique 
combination of operating speed'*, heterogeneous 
layout'’, flexible design”', three-dimensional 


(3D) form’”’? and other features that would be 


at least two orders of magnitude larger than the 
thickness. They differ from thin films in that they exist as free-standing, 
isolated forms at some critical stage in their growth or processing, or in 
their final, device-integrated forms. Because NMs offer many features 
that cannot be reproduced in other material formats, they are of central 
importance to a rapidly expanding frontier in nanoscience and techno- 
logy. The origins of work on NMs can be traced back nearly thirty years 
to exploratory research on cadmium-based nanocrystals’ and spherical 
fullerenes’. Studies of these and other ‘zero-dimensional’ materials 
evolved to include nanowires and carbon nanotubes’, partly because it 
is comparatively easy (although still difficult) to form electrical contacts 
to such ‘one-dimensional’ structures. Although diverse types of semi- 
conductor device are possible with individual wires/tubes, their practical 
application in high-yield, scalable systems faces formidable engineering 
challenges in assembly and other aspects of manufacturing. Materials in 
NM formats avoid these limitations, because their two-dimensional 
(2D) geometries are directly compatible with established device designs 
and processing approaches from the semiconductor industry, building 
naturally on decades of research in thin-film growth, patterning and 
processing. NMs also have finite-size and quantum characteristics in 
their electronic, phononic and optical properties, and have unique 
mechanical features, with effects related to shape distortions and folding 
of sheets, not found in zero- and one-dimensional materials. NMs can be 
made uniformly and repeatably (in size, shape, surface orientation, 
thickness and surface roughness) by ‘top-down” methods used in semi- 
conductor device manufacture. 

Many advanced materials have begun to be studied in this format. These 
include a wide variety of inorganic materials, from those as common as 
silicon to esoteric layered compounds* ®, as well as a rapidly growing range 
of forms of conjugated carbon’ ”’, not limited to graphene. Sophisticated 
methods are becoming available for manipulating NMs with thicknesses of 
as little as a fraction of a nanometre and with lateral dimensions of up to 
many square centimetres, at high throughputs and yields'””. NMs can be 
distributed over large areas, folded into various shapes'*'° and wrapped 
onto curvilinear surfaces'’, Advanced electronic and optoelectronic 


difficult or impossible to achieve with existing bulk 
materials technologies or with zero- or one-dimensional nanomaterials. 
These advances motivate the present Review of approaches to synthesis, 
assembly, and device integration for inorganic and organic NMs, exclud- 
ing graphene, with an emphasis on challenges and future opportunities. 


Inorganic nanomembranes 


Single-crystalline inorganic semiconductor NMs with thicknesses that 
match length scales of important physical processes (a few hundred nano- 
metres or less) offer opportunities in basic and applied research, as well as 
in technology, as suggested by recent demonstrations of practical devices 
that offer operational features unavailable with bulk materials. Figure 1 
illustrates representative examples of NM properties in mechanics, elec- 
tronics, thermoelectrics and photonics. In the first instance (Fig. 1a), the 
extremely small thicknesses of NMs (down to ~2 nm for silicon) lead to 
flexural rigidities that can be more than fifteen orders of magnitude smaller 
than those of bulk wafers (~200 [m) of the same materials”. The resulting 
values are so small, in fact, that they qualitatively change the nature of the 
material to allow otherwise impossible, non-planar geometries and multi- 
layer integration options. The latter capabilities arise from the combined 
effects of low rigidities and energy release rates for thermally driven dela- 
mination that decrease linearly with thickness**. As a consequence, NMs 
conform and bond robustly to nearly any surface, thereby enabling them 
to be stacked onto one another or onto foreign hosts to yield unusual, 
heterogeneous systems that cannot be achieved with wafer-bonding tech- 
nologies or epitaxy. Such stacking can lead to unusual electronic, electro- 
mechanical, thermoelectric, optoelectronic, optomechanical and photonic 
behaviour. 

Sufficient thinness yields the 2D physics of quantum confinement, even 
in simple, single-layer NMs, with important implications for electronic 
transport. Figure 1b shows, as an example, the splitting of the conduction- 
band-minimum valley of silicon into subbands as a function of the thick- 
ness of the NM, for two orientations”. The right-hand plot shows how 
NM roughness affects the 2D density of states for these quantum- 
confined NMs**. A related phenomenon is the extremely high sensitivity 
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Figure 1 | Unique physical properties in NMs. a, NMs have exceptionally 
high degrees of bendability, as illustrated in the scanning electron microscope 
(SEM) image. The flexural rigidity of a 2-nm-thick, silicon NM is ~10** times 
smaller than that of its bulk wafer counterpart (200 tm thick), as illustrated in 
the red curve of the graph (dashed line at 2 nm). Related mechanics allows 
bonding of NMs to nearly any surface. Here energy release rates associated with 
opening of interfaces between NMs and supporting substrates decrease linearly 
with thickness. The blue line represents calculations for silicon NMs bonded to 
sheets of polyimide at room temperature, and then heated to 300°C. 

b, Electronic confinement effects in silicon NMs lead to splitting of the 
conduction band valleys (4) for the (001) orientation (left) with representative 
1-s.d. error bars. Here the surface roughness (6) strongly affects the 2D density 
of states** (DOS; right). a.u., arbitrary units. c, Phonon confinement in NMs 
offers opportunities for manipulating heat flow, to optimize figures of merit in 
thermoelectrics. The image shows a suspended silicon NM (22 nm thick; red 


of charge transport to surface chemical condition, which also modifies the 
local band structure**. Other examples of dimensional effects in electronic 
properties are discussed below, in sections on synthesis and applications. 
The small thicknesses of NMs also strongly influence the behaviour of 
phonons and photons. Phonons can be used to increase key figures of 
merit (for example ZT) in thermoelectrics. NMs with arrays of nanoholes 
(Fig. 1c) that have lateral dimensions less than the mean free path of 
thermal phonons (~300 nm) and with thicknesses of about this value 
or less, produce strong backscattering effects that frustrate thermal trans- 
port, without reducing the electrical sheet resistance, owing to the com- 
paratively shorter mean free paths of electrons and holes**”’ (1-10 nm at 
high doping levels). Certain measurements of NMs with holes suggest 
thermal conductivities ~80 times smaller than values in bulk silicon, and 
enhancements in ZT of a factor of ~50 relative to NMs without holes”’. 
Conceptually related effects of confinement can be used to advantage in 
optoelectronics. Some of the earliest examples involve lasers using NM- 
based photonic crystals that inhibit spontaneous emission by 94% and 
guide preferential emission into vertical modes, improving output effi- 
ciency by a factor of almost five’*” (Fig. 1d). The thicknesses of the NMs 
in such systems are typically a fraction of the emission wavelength to 
guarantee single-mode behaviour. Measurements and simulations shown 
in Fig. 1d capture these effects** and also underscore the critical role of 
surface recombination in NMs, as discussed below in the context of 
related applications. 

Challenges common to all research efforts in NMs lie in the develop- 
ment of improved methods to synthesize NMs with precise dimensions 
and materials quality; to engineer structural features or distributions of 
strain in NMs to yield unusual charge, photon or phonon transport char- 
acteristics; to stack and bond NMs, with an emphasis on the control of 
surface/interfacial properties and on charge transport across bonded-NM 
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arrow) perforated with arrays of nanoholes (diameter, ~ 10-15 nm; period, 
~35 nm) that scatter phonons, thereby frustrating thermal transport”. The 
data compare such structures (NM) with arrays of nanowires (NWA; 28 nm 
wide, 20 nm thick), coarsely patterned NMs (EBM; square mesh with period of 
385 nm, 22 nm thick) and uniform NMs (TF; 25 nm thick), with representative 
1-s.d. error bars*®. d, Photon confinement in NMs allows for low-threshold 
lasers. The SEM image shows a photonic crystal that consists of an array of 
nanoholes (period, ~500 nm) in a GalInAsP NM (245 nm thick), which is 
designed to suppress rates of spontaneous emission and, simultaneously, to 
direct light into vertical modes”*. The graph shows measurements” (symbols) 
of emission efficiency, normalized to the case without nanoholes, as a function 
of the ratio of the period of the array (a) to the emission wavelength (A). The 
results indicate enhancements for a range of a// values. Calculations (solid 
lines) with various surface-recombination velocities (v,) capture the trends”. 
The blue region corresponds to the location of the photonic bandgap. 


interfaces; to deform NMs into unusual shapes for non-planar com- 
ponents, such as cylindrical microcavity lasers and stretchable, bio- 
integrated electronics; and to produce large numbers of NMs with 
precise geometries, efficiently and cost effectively, and assemble them 
in desired configurations at high throughputs. The following sections 
highlight exemplary recent advances and areas of future opportunity. 


Synthesis 

In a route similar to that used to create graphene from bulk pieces of 
graphite, inorganic NMs can be (and have been for many years) formed 
by chemical or mechanical exfoliation from solids that have naturally 
layered structures*°, such as the semiconductors MoS, (refs 4, 30, 31), 
Sr,Nb3Ojo (ref. 5), GeS (ref. 32) and GeSe (ref. 32). Recent work indi- 
cates that confinement effects in single-layer NMs of MoS, (Fig. 2a) lead 
to direct bandgaps, unlike the indirect gaps of bulk samples”. 
Transistors that incorporate these NMs show field-effect mobilities 
greater than those that use graphene structured in nanoribbon geomet- 
ries required to create a bandgap for efficient switching behaviour*'. As a 
result, exfoliated NMs are of interest as ultrathin alternatives to gra- 
phene for active materials in next-generation electronics, where inter- 
band tunnelling might be used to improve performance in low-power 
devices further. 

Although exfoliation yields large numbers of NMs from certain 
classes of semiconductors such as MoS;, additional methods are needed 
to control their dimensions and shapes, and to manipulate them for 
integration into systems. A synthetic strategy that addresses these 
requirements and expands the materials options involves the release 
of NMs from bulk semiconductors that are not naturally layered, by 
use of specialized anisotropic etching procedures. For example, defining 
trenches on the surfaces of silicon wafers with {111} orientations, and 
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Figure 2 | Representative routes for synthesizing inorganic monocrystalline 
semiconductor NMs. a, Atomic structure of MoS,, showing its layered 
configuration*'. Chemical or mechanical exfoliation of this material yields 
single-layer NMs (0.65 nm thick), as shown in the transmission electron 
microscope image on the right”. b, Process for generating multilayer stacks of 
silicon NMs from a bulk wafer by anisotropic etching. Patterned features of etch 
resist (gold) on the structured sidewalls of vertically etched trenches allow access 
of an anisotropic wet chemical etchant only to certain regions of the silicon. 
Etching releases silicon NMs (~100 nm thick), as shown at two intermediate 
times in the cross-sectional SEM images on the right”. c, Epitaxial multilayer 
assembly of GaAs and aluminium arsenide (AlAs) grown on a GaAs wafer”. 
Etching vertically through the thickness of the stack and then immersing the 
structure in hydrofluoric acid leads to the selective removal of the AlAs layers. 
Complete undercut etching releases large numbers of GaAs NMs. The SEM 
image shows a collection of GaAs membranes formed using this process”®. 

d, Release of a silicon NM from a SOI wafer. Etching vertically through the top 
silicon layer exposes the underlying SiO, layer, allowing its removal by etching in 
hydrofluoric acid. The optical image on the right shows a wrinkled, but 
completely single-crystal, silicon NM (~50 nm thick) formed in this manner 
that can then be transferred to a new host, where it will flatten and bond’®. 


then patterning their side walls with etch resists provides a starting point 
for the anisotropic removal of material along the (110) directions using 
solutions of potassium hydroxide***’. The process (Fig. 2b) releases 
stacks, and, with minor modifications, essentially bulk-like numbers 
of NMs*. An appealing aspect of this synthesis is that lithography 
defines the lateral dimensions of the NMs and their spatial positions 
across the wafer, thereby rendering them compatible with methods for 
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integration described below. Cycles of thermal oxidation and etching 
can reduce the thicknesses and passivate the interfaces. 

Methods that offer atomically smooth surfaces and enhanced dimen- 
sional control over large areas begin with epitaxial growth to form 
releasable multilayer assemblies. For example, stacks of gallium arsenide 
(GaAs) films separated by aluminium gallium arsenide (AlGaAs) yield 
large numbers of GaAs NMs on selective removal of the AlGaAs with 
hydrofluoric acid**. This process provides significant cost and throughput 
advantages over related approaches that release only single layers”, 
owing to more efficient use of the systems for epitaxial growth and the 
supporting substrates*®. Fig. 2c shows an illustration of the process and an 
image of a deposit of released GaAs NMs formed by casting from a solution 
suspension”®. Similar multilayers can yield, from a single stack, NMs for 
multifunctional integrated systems of radio-frequency electronics, photo- 
detectors, light emitters and solar cells**"°. Epitaxial layers for passivation 
can be grown directly, to form an integral part of the NM. Multimaterial 
structures, such as those of GaAs with self-assembled quantum dots of 
indium arsenide (InAs) as embedded light emitters, are also possible, for 
applications in optoelectronics*’. The thin geometries of NMs can be 
important in these types of epitaxial synthesis because they avoid the 
dislocations that occur in films thicker than the critical thickness for defect 
formation in strained layers”. Additionally, greater materials possibilities 
follow from the use of NMs as growth substrates, where modified lattice 
constants or strain symmetries, not achievable in bulk materials, can be 
exploited, as described below. 

A complementary scheme that avoids the demands of epitaxy entirely 
uses layered materials formed by wafer bonding followed by polishing or 
controlled fracture. The most common example starts with a thin layer of 
silicon on silicon dioxide (SiO,) supported by a silicon substrate, known 
as silicon on insulator'***** (SOI). Etching the buried oxide with hydro- 
fluoric acid releases the top silicon layer as a NM (Fig. 2d). Commercially 
available SOI can be used to make silicon NMs with thicknesses down to 
20 nm. Oxidation and etching can reduce the thickness to <2 nm, with 
uniformity greater than 0.3nm (ref. 25). Other examples of SOI-like 
structures include group-IV analogues such as germanium on insulator, 
strained silicon on insulator and silicon-germanium (SiGe) on insulator, 
as well as III-V semiconductors and many other combinations**”. 


Strain engineering and 3D nanoarchitectures 

An intriguing aspect of the synthesis of NMs is the ability to modify 
structures by lithographic processing (Fig. 1c, d) or by introducing 
spatial distributions of strain. The latter offers great promise for the 
creation and investigation of new physical properties. The distinctive 
mechanical properties of NMs (see discussion above and Fig. 1a), allow 
this strain engineering. Strain changes the lattice constant, thereby 
creating new properties relative to the unstrained, but chemically ident- 
ical, material. The ability to alter the strain, in magnitude, direction, 
spatial extent, periodicity, symmetry and/or nature, allows tuning of 
the intrinsic properties to such a degree that many are significantly 
modified, including band structure, charge carrier mobility, atomic 
transport, atomic defect structure, the self-assembly of quantum dots, 
piezoresistivity, and more complex phenomena such as electro-optical 
effects”. 

Lattice strain can be introduced by heteroepitaxial growth of materials 
with different lattice constants'***°’. For example, in a trilayer of Si/SiGe/ 
Si grown on SOI, the SiGe layer is compressively strained and the silicon 
layers are unstrained. When this trilayer is released, the SiGe layer shares 
its strain elastically with the silicon layers, with strain magnitudes of up to 
1%. Strains in this range can cause significant changes in the band struc- 
ture of silicon®, such as to improve the performance of transistors on 
flexible polymers”’, and, if applied locally and periodically, to form strain- 
induced, single-element electronic heterojunction superlattices*””’. This 
elastic strain sharing, with appropriate processing, can be used to produce 
special defect-free semiconductors that cannot be realized in other 
ways”’, and, by taking advantage of differing crystal symmetries in the 
components of strained multilayer composites, to make entirely new 
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materials with crystal symmetries that do not exist in bulk form and 
cannot be created by heteroepitaxy alone”. 

Strain can also be introduced mechanically. Again, the distinctive mech- 
anical properties of NMs are decisive in achieving novel properties. For 
example, it has recently become possible, because the conduction-band 
valleys shift by different amounts with strain, to stretch germanium NMs 
biaxially sufficiently to change germanium from an indirect-bandgap 
semiconductor to a direct-bandgap semiconductor”. This transformation 
allows use of germanium in light sources, thereby realizing the vision of 
group-IV-semiconductor integrated electronics and optics. 

A related consequence of strain control is that the geometries of NMs 
can be engineered to yield 3D shapes, allowing device configurations 
and properties that would be impossible to achieve with bulk materials. 
Possibilities include tubes that can provide active growth platforms for 
cells**, cylindrical microcavities that can serve as fluidic channels and 
optical sensors”, and buckled structures that can respond elastically to 
large strains and can be used in stretchable electronics**'. With 
appropriate engineering of strains in systems in which there is a strain 
gradient perpendicular to the layers, tubes, spirals, rings and other 3D 
nanoarchitectures can be achieved'*"'*“*. For isotropic elastic moduli, 
the bending, rolling or curling behaviour can be calculated using the 
classical Timoshenko formula™. In this case, a strained bilayer NM strip 
tends to roll (along its long direction) into a tube when the strip is wide 
or into a ring when the strip is narrow. A long narrow strip may, however, 
form a coil, owing to shear terms in the minimization of energy. Figure 3a 
illustrates the conditions. A tube will form if the width, W, of the strip is 
large relative to the radius of curvature determined by the bilayer strain, 
which is related to the circumference, Lo, of the tube. Beyond a critical 
angle, 0., determined by W/Ly = sin(0,), a long, strained NM strip will 
roll into a coil of radius Ro, which also is related to the radius of curvature 
determined by the bilayer strain. More varied shapes can form when the 
materials have elastically soft and hard directions. Figure 3b shows rolled- 
up tubes of GaAs NMs with embedded quantum wells, which have 
applications as an unusual type of optical resonator®’. Fig. 3c shows 


Small W< Lo 
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Figure 3 | Inorganic monocrystalline semiconductor NMs in non-planar 
configurations. a, Rolling and curling in a strained bilayer NM, illustrating the 
geometric parameters that determine the morphology: Lo = 27Rp is the 
circumference of a tube that may form; L and W are the length and width of the 
strip, respectively; and f is its thickness. The critical angle for coil formation 
over tube formation is 0.. The arrows indicate the folding direction’. b, SEM 
image of a collection of GaAs NMs with embedded quantum well structures. 
The tubular shapes form on release from the substrate, owing to strain in the 
epitaxial layers®’. c, SEM image of an array of partly released spiral structures 
formed by SiGe (10 nm)/Si (7 nm)/Cr (20 nm) NMs attached at their centres to 
a silicon wafer®. 
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spirals that form ribbon-shaped SiGe NMs™. The centre regions remain 
flat because they are attached to the underlying substrate. Lithographic 
patterning allows elaborate architectures that create many new design 
opportunities in semiconductor devices, as described in the section on 
applications below. 


Organic nanomembranes 


The chemical diversity and unique properties of organic NMs make 
them an attractive complement to the inorganic materials described 
above. Graphene provides the most compelling existing example 
because of its superlative mechanical, thermal and charge transport 
properties, and the device opportunities, unavailable to bulk graphite®, 
that it affords. These observations motivate work to explore other 2D 
carbon allotropes, such as those with sp- and sp’-hybridized carbon (for 
example graphyne and graphdiyne) with similar, molecular-scale thick- 
nesses’ ”°°°’, These materials, as well as other more structurally and 
compositionally diverse organic NMs, are expected theoretically to offer 
bandgaps, mobilities and other electronic characteristics that are much 
different from those of graphene’***”. For example, calculations sug- 
gest that graphdiyne has a mobility as great as ~10°cm?V_'s ‘anda 
bandgap of ~0.5 eV (ref. 68), thereby making this type of NM attractive 
as a semiconductor for power-efficient, high-speed transistors, where 
unpatterned graphene NMs cannot be used because of their intrinsically 
semi-metallic nature. 

The construction of 2D, ordered networks of carbon-carbon bonds in 
materials other than graphene remains a frontier challenge in chemistry*”. 
Difficulties lie in the establishment of bonds with the necessary precision 
in high-molecular-mass systems that have the requisite solubility and 
avoid the propensity to aggregate”. As a result, present techniques of 
bulk, solution-phase synthetic chemistry typically limit the formation of 
organic NMs to lateral dimensions barely exceeding the molecular regime. 
An alternative scheme involves the assembly of molecular building blocks 
at interfaces, in the form of self-assembled monolayers (SAMs) or 
Langmuir-Blodgett films, to provide planar precursor films for NMs that 
form by reactions between the molecular constituents. Figure 4a illustrates 
the former process, and Fig. 4b shows an example of a 2,5-substituted 
dialkynylbenzene-bearing SAM on silica, catalytically crosslinked by 
alkyne metathesis to yield a highly conjugated carbon NM (a remarkably 
tough monolayer-thick sheet), capable of release and transfer to a silicon 
wafer for possible integration with established electronically active com- 
ponents”. Alternatively, crosslinking can be accomplished in related 
SAMs by low-energy electron bombardment”. For nitroaryl thiols on 
gold, this process forms a dense, crosslinked matrix in the aryl segments 
and elicits a reduction of the chain-end nitro substituents to amino 
groups. Detachment from the gold yields a NM that bears thiol groups 
on one face and amino groups on the other, with potential relevance for 
devices that demand different electronic interfaces on top and bottom. In 
Fig. 4c, the NM sits on a supporting grid; the colour arises from fluorescent 
labelling of the amino side. This synthesis also allows patterning by spa- 
tially modulating the electron dose, which has clear relevance for use in 
devices”’. Pyrolysis yields metallic NMs that have conductivities similar to 
those found in graphitic forms of carbon”’, with immediate applications as 
conductive grids for transmission electron microscopy. 

Although reported Langmuir—Blodgett and SAM techniques can yield 
macroscopic NMs in various forms, the absence of long-range coherence 
in the bonding configurations severely degrades charge transport char- 
acteristics. This challenge can be addressed with interfacial interactions 
templated by monocrystalline substrates. Recent work demonstrates the 
feasibility of this idea in the case of graphene-like NMs, in a manner that 
seems to be extendable to other organic NMs’*”*. Figure 4d, e shows 
chevron-shaped graphene ribbons synthesized on a Au(111) crystal 
through the thermolytic condensation and cyclodehydrogenetic conver- 
sion of the molecular precursor 6,11-dibromo-1,2,3,4-tetraphenyltriphe- 
nylene”™. These ribbons align and assemble along the direction of 
the corrugation of the herringbone reconstruction of the Au(111) sub- 
strate. Related approaches have yielded interesting classes of crystalline 
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monolayers from assemblies of molecules of 1,4-benzenediboronic acid, 
interlinked by the B-O-B bonds of the boroxine moiety”*. Recent results 
suggest that other covalent organic frameworks can be synthesized using 
graphene as a templating substrate’’. With additional work, these types of 
chemistry have great potential to yield the classes of monocrystalline 
organic NMs considered in this Review. 


Single- and multilayer assembly 


Systematic scientific studies benefit from, and engineering applications 
require, reliable techniques for integrating NMs into device or test struc- 
tures. Materials created by bulk synthesis or by batch exfoliation from 
solids or surfaces can be assembled most readily through processes that 
start with suspensions of NMs in fluids’”””, where adapted forms of 
ultracentrifugation, membrane filtration and chromatography offer 
means of separation, size selection and purification®’. Langmuir- 
Blodgett techniques and controlled precipitation can yield thin-film- 
type assemblies, in single- or multilayer formats’”*°*'**. The resulting 
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Figure 4 | NMs of conjugated carbon and their synthesis using interfacial 
methods. a, Approach to synthesis based on chemical crosslinking of a SAM”. 
b, Optical micrographs of a highly conjugated carbon NM synthesized by 
crosslinking a 2,5-substituted dialkynylbenzene SAM by alkyne metathesis, 
resting on a SiO,/Si substrate’. A wrinkled region of the NM appears in 
magnified view on the top right. The chemical structure of the monomer 
appears on the bottom left. c, Fluorescence resonance energy transfer image of a 
~1-nm-thick ‘Janus’ NM (blue) suspended over a supporting, hexagonal grid 
structure’! (black). This NM, which has some tears and other defects, was 
formed by exposing a 4’-nitro-1,1'-biphenyl-4-thiol SAM to electrons at 
100eV and 50mCcm ~*. d, Chemical synthesis of chevron-shaped graphene 
nanoribbon structures on Au(111), formed by thermolytic condensation and 
cyclodehydrogenetic conversion of the molecular precursor 6,11-dibromo- 
1,2,3,4-tetraphenyltriphenylene”. e, High-magnification (inset) and low- 
magnification (main image) scanning tunnelling microscope images of straight 
and chevron-shaped graphene nanoribbons on Au(111) synthesized in the 
manner illustrated in d”. 


REVIEW 


deposits can be transferred to substrates of interest for integration into 
devices as thin films. The levels of control in such solution-based processes 
can be enhanced through patterned surface functionalization, controlled 
fluid flows, capillarity or shape complementarity to guide the placement 
of individual NMs, using schemes originally developed for nanowires/ 
nanotubes and for small-scale integrated circuits****. The intrinsically 
stochastic nature of these processes of guided self-assembly, however, 
imposes limits on the yields and the placement accuracy. 

By contrast, NMs formed at interfaces that make controlled release 
possible can be manipulated using purely deterministic assembly tech- 
niques, with high yields and extreme accuracy in position, orientation 
and layout. Such capabilities are essential for all foreseeable applications 
in electronics, because of the requirement to integrate the NMs at spe- 
cific locations in larger systems with submicrometre accuracy. The most 
well-developed approaches exploit forms of printing using soft, elasto- 
meric stamps that allow manipulation of NMs without exceeding the 
critical strain levels for structural damage’?’*°**?**, As an advanced 
example, Fig. 5a shows a structured stamp (blue) designed to allow 
pressure-modulated adhesion with two states, strong adhesion (ON) 
and weak adhesion (OFF), to facilitate retrieval and printing, respec- 
tively; here, the stamp is ‘inked’ with a thick platelet of silicon (green). 
(Temporary ‘carrier’ films of photoresist allow manipulation of NMs 


a 
b ost - : - 
7 Q..--——- 
= 0.6) | eee ° 8 
c YV.OF —— ON 4 
S SS 
5 7 
go | 
S | re) Switch 
9 0210 
<= ' 
5s ' OFF 
_ Of Ade —---de So a -—— a 
0 100 200 300 400 500 600 700 
Peeling rate (um s-1) 
c _d 


<= 


Figure 5 | Operation of elastomeric stamps for deterministic assembly of 
NMs, with examples of printed sparse arrays and multilayer assemblies. 

a, Colourized SEM images of a single post in an elastomeric stamp (blue) that 
uses soft, pyramidal relief features to provide strong adhesion in a collapsed state 
(ON) and weak adhesion in a retracted state'® (OFF). Control of the applied 
pressure allows reversible switching between these two states. b, Measured 
adhesion strength in the ON and OFF states, as a function of peeling rate'®. 
Viscoelastic effects in the elastomer lead to monotonic increases in adhesion 
with rate, with pronounced effects observable in the ON state. The dashed lines 
are guides for the eye. c, Sparse array of GaAs membranes (small black squares) 
assembled by printing onto a plate of glass (main image) and a bent sheet of 
plastic (inset)*°. d, Cross-sectional SEM of an eight-layer stack of silicon NMs 
(each ~340 nm thick) separated by transparent layers of polymer. Inset, 
schematic; the red box outlines the cross-section shown in the main image. 
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with arbitrarily small thicknesses.) The data in Fig. 5b illustrate this 
switching capability, where both geometric’® and viscoelastic'' effects 
have important roles. Printing in single or multiple cycles can populate 
small or large areas of flat or curved substrates with NMs at any level of 
coverage, from full monolayers to sparse distributions in regular or 
irregular layouts. Figure 5c shows an organized collection of membranes 
of GaAs printed onto a flat plate of glass and a bent plastic substrate, 
starting from a dense array released from a GaAs wafer, using tech- 
niques similar to those of Fig. 2c’°. Recent work indicates that it is even 
possible to print rolled-up NMs (Fig. 3b) using soft stamps">. 

These methods, and related ones that rely on hard stamps or releas- 
able tapes, can be used with both inorganic and organic NMs to yield 
various configurations, including suspended ‘bridge’ and ‘drumhead’ 
geometries'®”® for resonators in nanoelectromechanical systems, large- 
area, continuous sheets*’ for flat-panel displays, and unusual, hybrid- 
material constructs**. As an example of the last possibility, Fig. 5d shows 
a stack of eight silicon NMs, each separated by a transparent polymer 
film, for an application in multilayer optoelectronics where these layers 
support waveguide arrays for phase-controlled beam steering. Fully 
automated tools now exist for assembling these and related struc- 
tures**”°, with printing rates of up to millions of NMs per hour, or more, 
depending on the sizes and layouts. Yields of nearly 100% and placement 
accuracy better than 1 jum can be achieved with NMs having thicknesses 
in the near-atomic range, relevant for single-layer NMs*””, and areas of 
square centimetres or more. 


Applications in electronics and optoelectronics 


The synthesis and assembly techniques described above, coupled with 
the unique mechanics and confinement effects provided by NMs, pro- 
vide diverse capabilities in electronics. The most intriguing areas of 
application are those that cannot be addressed with any other bulk, 
thin-film or nanomaterials technology. Advanced demonstration 
devices of this type have been reported, perhaps most significantly in 
the area of flexible and stretchable electronics. In these cases, NMs of 
silicon!®*1°3°°!°?, GaAs (refs 36, 40, 93) or GaN (ref. 94) serve as active 
materials, mounted on plastic or rubber substrates and configured in 
mechanical layouts and 3D nanoarchitectures that allow bending, 
stretching, folding, twisting and other demanding modes of deforma- 
tion without inducing damage or fatigue in the materials. 

An emerging NM technology that exploits these features involves 
intimate coupling of electronics with biological tissues in ways that 
would otherwise be impossible*'**”*. Fig. 6a shows an example of this 
sort of bio-integrated device, in which a thin, flexible film supports an 
interconnected array of 288 actively multiplexed, amplified sensing elec- 
trodes”’. The system includes more than 2,000 silicon NM transistors on 
a thin sheet of polyimide, in a waterproof format that non-invasively 
laminates onto the epicardial surface (porcine animal model shown 
here), like a piece of Saran wrap. The device performs temporal and 
spatial mapping of electrophysiology at unprecedented levels of speed 
and resolution, for diagnostic purposes in surgical procedures to treat 
arrhythmias and other forms of cardiac disease. The colour inset in 
Fig. 6a shows typical data. Related components can serve as advanced 
surgical tools or therapeutic devices, not only in cardiology but also in 
neurology and other areas. 

Key performance attributes of the NM transistors in these systems are 
enhanced by layers of SiO, that passivate the surfaces of the silicon to 
facilitate charge transport. As a result, these circuits have performance 
comparable to analogous systems on SOI, with normalized current 
outputs” and switching speeds” that exceed those of systems based 
on organic semiconductors or films of quantum dots or nanowires. 
Such NMs can in fact provide a route to plastic radio-frequency elec- 
tronics”. Fig. 6b shows gigahertz-frequency operation in silicon NM 
transistors on a sheet of polyethylene terephthalate’*. This type of tech- 
nology provides a radio-frequency electronics platform of interest not 
only for its unusual mechanics but also for its potential as a low-cost 
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Figure 6 | NMsas active materials in unusual electronic and optoelectronic 
devices. a, Bio-integrated electronics for high-resolution mapping of cardiac 
electrophysiology in a porcine animal model, with applicability in humans”. The 
device consists of nearly three hundred independent measurement locations, 
with local amplifier circuits and multiplexers that collectively use more than 2,000 
silicon NM transistors in a waterproof construction on a thin sheet of polyimide. 
The colour inset provides a representative map collected using this device. 

b, Small-signal current gain (H>;) and power gain (Gmax) as functions of 
frequency for high-speed silicon NM transistors on a plastic substrate’*. The 
results show limiting frequencies of 3.8 GHz for current gain (f;) and 12 GHz for 
power gain (firax). For power gain, the solid and dashed lines correspond to 
measurement and simulation, respectively. c, Cross-sectional transmission 
electron microscope images ofa transistor that uses an InAs NM heterogeneously 
integrated on an oxidized silicon wafer, at moderate (top) and high (bottom) 
magnifications”’. d, Measured (solid) and simulated (dashed) width-normalized 
drain-source currents (Ips) as functions of gate-source voltage (Ves) for 
transistors as in c with InAs NM thicknesses of 8, 13, 18 and 48 nm (ref. 96). A 
cross-sectional schematic of the transistor is shown at top. e, SEM images of a 
cylindrical microcavity laser formed with a NM (~50 nm thick, with two 
InGaAs/GaAs quantum dot layers and a pseudomorphic Ing ;gGag g2As quantum 
well), at low (top) and high (bottom) magnifications”. f, Integrated output 
intensity as a function of excitation power (HeNe laser emission at 632.8 nm) for 
emission in an optically resonant mode with a wavelength of 1,240.7 nm (ref. 97). 


alternative to conventional systems, which require semiconductor 
wafers as substrates. 

In the devices of Fig. 6a, b, NMs overcome incompatibilities in proces- 
sing and growth conditions between high-temperature active materials 
and low-temperature substrates, by exploiting the mechanics illustrated 
in Fig. la. A related advantage appears in the context of mainstream 
electronics in cases where integration by means of heteroepitaxial growth 
or wafer bonding is impossible because of mismatches in lattice constants 
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or coefficients of thermal expansion, respectively. For example, silicon 
NMs can be bonded directly to germanium substrates and vice versa, 
with narrow interfaces and high cross-interface electrical conductivity, 
which is challenging to achieve with bulk wafer-bonding techniques. 
Such unique capabilities make possible a range of group-IV optoelectronic 
and tunnelling devices in which charge transport across interfaces is 
essential”’. 

Equally important in digital integrated circuits, but not typically 
requiring cross-interface transport, is joining compound-semiconductor 
NMs with silicon wafers, down to the level of individual devices. There 
the high mobilities and conductivities of InAs NMs, for instance, could 
help to overcome fundamental limitations on the speed and energy effi- 
ciency of silicon transistors. Figure 6c shows cross-sectional transmission 
electron micrographs of a transistor formed from an InAs NM printed 
onto a silicon wafer”® using the techniques described in the previous 
section and the bonding mechanics illustrated in Fig. la. Figure 6d 
illustrates the transistor schematically and shows its current-voltage 
characteristics. Strong confinement effects allow orders-of-magnitude 
increases in switching ratios compared with bulk devices, mainly through 
improved electrostatic coupling to the gate, and reductions in the 
maximum currents due to a transition from 3D to 2D transport. 
Figure 6d shows trends as the thickness of the NM increases from 8 to 
48 nm. Interfacial layers of InAsO,, thermally grown onto the InAs NMs 
before device integration, are critically important in reducing the inter- 
facial trap densities. 

Passivated NMs can also be used to achieve radically different device 
geometries. As an example, Fig. 6e shows a cylindrical microcavity laser 
formed by stress-induced rolling of a GaAs/InGaAs NM with embedded, 
self-organized quantum dots as the gain medium”’. As configured in a 
suspended state, the tube yields a cavity with excellent coupling of the 
maximum field intensity and the gain region. Minimal optical scattering 
and reduced non-radiative recombination at surface defects result 
from epitaxially smooth surfaces and effects of carrier confinement, 
respectively. The graph in Fig. 6f shows measured characteristics of the 
microcavity laser. Non-planar device designs represent an interesting 
alternative to the more widely explored microcavities based on photonic 
crystals, microdisks, micropillars and other geometries. These and other 
unique 3D nanoarchitectures allowed by NMs are also being explored for 
use in energy storage devices”, sensors” and other components”. 


Conclusions and outlook 


The existence of a recently developed, powerful set of capabilities in NM 
materials and assembly, taken together with multiple important and 
uniquely addressable application areas, provides excellent motivation 
for expanded activities in this rapidly emerging field. The possible topics 
for basic study are many, and include investigations of physical, chemical 
and transport properties at interfaces between heterogeneous, printed 
stacks of NMs; phenomena in the limit of ultrathin geometries where 
quantum confinement, interface depletion effects and molecular modi- 
fication can be important; modified phonon and thermal characteristics 
for controlled heat flow in structured NMs; and strain engineering for 
spatially modulated bandgap properties. 

For work in engineering, the most promising areas are in systems with 
operational features that cannot be achieved using established approaches. 
The overall device integration, as currently practiced, involves release of 
NMs from a source substrate, followed by assembly and final intercon- 
nection. This sequence is much different from the prevailing trend in 
conventional electronics, where individual devices, produced at the 
highest levels of interconnection possible on a semiconductor wafer, 
are diced and assembled as a terminal packaging step. The engineering 
challenges and balance of costs associated with the NM approach are 
important topics of research and development in manufacturing. An 
optimized process might incorporate a blend of strategies whereby, for 
example, some significant level of integration is accomplished on the 
NMs before their release and assembly, depending on the details of the 
application. 


REVIEW 


Techniques for synthesizing NMs are central to all future activities. 
New ideas are needed to expand the range of inorganic NM materials 
beyond those that can be achieved by known exfoliation, etching, 
epitaxial and bonding methods. As for other classes of nanomaterials, 
morphological and chemical properties of the surfaces of NMs are para- 
mount. In some cases, existing technologies for surface passivation can 
be adapted to create multilayer NM structures, such as SiO/Si/SiOz, that 
embed critical interfaces and isolate them from the environment. In 
others, these surfaces can be exposed and appropriately functionalized 
for applications in sensors. Progress on these and related topics will 
facilitate applications in fields inclusive of but wider ranging than elec- 
tronics and optoelectronics, such as nanoelectromechanical devices, 
photonic/plasmonic structures, thermal and/or mechanical energy- 
harvesting elements, micro/nanoscale pressure sensors, micro/nanoflui- 
dic devices, molecular sensors, sieves, scaffolds for cell culture and others. 
Synthesis of functionally useful organic NMs is a persistent and notable 
challenge in chemistry, but one that now seems possible to address by 
interfacial assembly and crosslinking on monocrystalline substrates. For 
all classes of materials, understanding the physics of transport in shaped, 
chemically functionalized and/or strain-engineered NMs may lead to 
additional properties that lie beyond those that can be achieved otherwise. 

Application opportunities seem to be particularly promising in bio- 
integrated systems, where many NM materials and structures might be 
combined in packages that establish and actively maintain intimate, 
dynamic interfaces with the body. In such cases, organic and inorganic 
NMs could function together, with the former at the bio-interface for 
sensing, exchanging materials and establishing bio-compatibility, and 
the latter separately located for the purposes of actuating, processing and 
transmitting data and providing power. The ability to engineer ‘soft’, 
elastic responses and 3D, curvilinear configurations in NMs with opti- 
mized nanoarchitectures on ‘tissue-like’ substrates will be essential to 
satisfying dimensional and mechanical requirements set by biological 
constraints. Understanding the nanomechanics of these hard-soft 
hybrid-material constructs, where the elastic moduli can differ by a 
factor of one million or more, will be necessary to allow precise physical 
matching to tissues. Recent advances, for example, demonstrate the 
ability to form NM electronics with the properties of the epidermis”. 
An ultimate goal might be systems based on man-made NMs that pro- 
vide seamless, integrated functions in living systems, potentially rival- 
ling those of naturally occurring NMs in biology. The interesting 
fundamental science, the diverse possibilities for creative engineering 
and the strong potential for broadly influential outcomes make this field 
of NM research a fertile one for future investigation. 
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Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are 
typically small and information on the underlying biological processes is often lacking. Associations with metabolic 
traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we 
report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted 
metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect 
sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our 
associations provide new functional insights for many disease-related associations that have been reported in previous 
studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous 
thromboembolism and Crohn’s disease. The study advances our knowledge of the genetic basis of metabolic 
individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research. 


Understanding the role of genetic predispositions and their interaction 
with environmental factors in complex chronic diseases is key to the 
development of safe and efficient therapies, to diagnosis and to preven- 
tion. GWAS have identified hundreds of disease-risk loci!; however, 
functional information on the underlying biological processes is often 
lacking’. Previously, we have shown the promise of using associations 
with blood metabolites as functional intermediate phenotypes (so- 
called genetically determined metabotypes (GDMs)) to understand 
the potential relevance of genetic variants for biomedical and phar- 
maceutical research**. Building on this previous work, we present here 
the most comprehensive evaluation of genetic variance in human meta- 
bolism so far, combining genetics and metabolomics for hypothesis 
generation ina GWAS. We used an extensive, non-targeted and meta- 
bolome-wide panel of small molecules, analysing >250 metabolites 
from 60 biochemical pathways in serum samples from 2,820 individuals 
from two large population-based European cohorts. We identified 37 
genetic loci that were significant at a stringent genome-wide threshold. 
In contrast to most GWAS, these loci showed exceptionally large effect 


sizes of 10-60% per allele copy in 25 loci. In the majority of cases, a 
protein that is biochemically related to the associated metabolic traits is 
encoded at these loci. As a proof-of-principle validation of new dis- 
coveries, we experimentally validated the predicted function of 
SLCI6A9 as a carnitine efflux transporter. We further cross-referenced 
these loci with databases of disease-related and pharmaceutically- 
relevant genetic associations, uncovering hitherto unknown links and 
providing new hypotheses for the functions of these loci. We have made 
a knowledge-base resource publically available via a web-server to aid 
future functional studies, and biological as well as clinical interpretation 
of GWAS findings. This study provides compelling evidence for novel 
associations of metabolic traits with a wide range of loci of biomedical 
and pharmaceutical interest, and indicates a powerful new paradigm for 
dissecting human metabolic and disease pathways. 


Study design 


Metabolic profiling was done on fasting serum from participants in the 
German KORA F4 study (n = 1,768) and the British TwinsUK study 
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(n = 1,052), using ultrahigh-performance liquid-phase chromato- 
graphy and gas-chromatography separation, coupled with tandem 
mass spectrometry*’. We achieved highly efficient profiling (24 min 
per sample) with low median process variability (<12%) of more than 
250 metabolites, covering more than 60 biochemical pathways of 
human metabolism (Supplementary Table 1). On the basis of our 
previous observation that ratios between metabolite concentrations 
can strengthen the association signal and provide new information 
about possible metabolic pathways*’, we included all pairs of ratios 
between these metabolites in the genome-wide statistical analysis. To 
reduce the computational and data-storage burden associated with 
meta-analysing more than 37,000 metabolites and ratios, we applied 
a staged approach for selection of promising association signals 
(Supplementary Fig. 1). In the initial screening stage, we assessed the 
association of approximately 600,000 genotyped single nucleotide 
polymorphisms (SNPs) with more than 37,000 metabolic traits (con- 
centrations and their ratios) by fitting linear models separately in both 
cohorts to log-transformed metabolic traits and adjusting for age, 
gender and family structure (Supplementary Fig. 2 and Supplemen- 
tary Table 2). Next, we selected all association signals showing sug- 
gestive evidence of association with a metabolic trait in both cohorts 
(P< 10 ° in both cohorts, or P< 10 ° in one and P<10 ° in the 
other). For each of these loci, we then reassessed the association signals 
through fixed-effects inverse variance meta-analysis of the two cohorts 
for all 37,000 available traits, using imputed SNPs relative to HapMap2 
data (see Methods for details). The combination of SNP and trait that 
yielded the smallest P value in this meta-analysis was finally selected for 
each locus. To account for multiple testing, we applied conservative 
Bonferroni correction, leading to an adjusted threshold for genome- 
wide significance of P< 2.0 X 10°". 


Study results 


We identified 37 independent loci that reached genome-wide signifi- 
cance in the meta-analysis (Table 1 and Supplementary Tables 3 and 
4). Twenty-three of these loci describe new genetic associations with 
metabolic traits, and 14 replicate and extend our knowledge of known 
GDMs, including 10 from our own studies**. We used information on 
the locations of SNPs in genes, on known gene functions and on regional 
association plots (Supplementary Fig. 2) to prioritize plausible candidate 
genes within associated loci. In most cases, our annotation was further 
supported by a statistical analysis of the association of gene relationships 
in published literature’ (Supplementary Table 5). Associations with addi- 
tional metabolic traits at the 37 loci presented in Table 1 may capture 
further biochemical information, and are provided as Supplementary 
Table 6. At 30 loci, the sentinel SNP mapped to a protein that was 
biochemically linked to the associating metabolites, for instance because 
it was responsible for their synthesis, degradation or metabolism. Next, 
we searched literature and databases extensively (see web links in 
Methods) to identify which of these 37 loci were previously reported as 
being associated with a clinical endpoint, a medically relevant intermedi- 
ate phenotype, or a pharmacogenetic effect. Associations of metabolites 
with disease loci can be used to gain novel information about possible 
metabolic changes associated with biological processes underlying that 
association (Fig. 1, Table 1 and Supplementary Table 7). In 15 cases, such 
a relationship could be identified on the basis of an association of the lead 
SNP or a proxy (7° = 0.8) with the disease-associated SNPs, including 
those for cardiovascular disease, kidney disease, Crohn’s disease, gout, 
cancer, adverse reactions to drug therapy and predisposing risk factors 
for diabetes and cardiovascular disease. In all except three loci, the SNPs 
are common, with minor allele frequencies greater than 10%. In 25 cases, 
the effect size per allele copy is larger than 10%, and up to 60% in the case 
of the acyl-CoA dehydrogenase (ACADS) locus. 


Overlap with chronic disease loci 


Many genetic-risk loci for heart disease, kidney failure, diabetes and 
other complex disorders have been identified by GWAS. However, the 
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aetiology of these common diseases is complex and testable hypo- 
theses are needed to develop new avenues for diagnosis and therapy. 
Associations of known disease-risk loci with metabolic traits allow the 
identification of new and potentially relevant biological processes and 
pathways. Here we report some examples from our study that illus- 
trate this idea. The full association data set is freely available for 
further analysis and reference at http://www.gwas.eu. 


Detoxification and kidney failure 


N-acetylation is an important mechanism to detoxify nephrotoxic 
medications and environmental toxins. A reduced ability to detoxify 
such substances could lead to impaired kidney function. A key GDM is 
the N-acetyltransferase 8 (NAT8) locus, which was reported to asso- 
ciate with kidney function’®"’. Here we found a highly significant asso- 
ciation of variation at the NAT8 locus with N-acetylornithine. Using 
this information, we investigated whether N-acetylornithine concen- 
trations were associated with kidney function. In both our studies, we 
found a clear association with estimated glomerular filtration rate 
(eGFR), whereby higher levels of N-acetylornithine were correlated 
with lower eGER (Prora = 7.6 X10 *, Prwinsux = 3.6 X 10 ® after 
adjusting for age and gender). In accord with the genetic effect of the 
NATS8 polymorphism in chronic kidney disease (CKD), the risk allele 
identified here was associated with higher N-acetylornithine concen- 
trations. Although causality cannot be inferred from this kind of asso- 
ciation study, the role of ornithine acetylation in the aetiology of CKD 
warrants further exploration. 


Diabetes 


Glucokinase (hexokinase 4) regulator (GCKR) is a major pleiotropic 
risk locus associated with diabetes- and cardiometabolic-related 
traits, such as fasting glucose and insulin levels”, triglyceride levels’’ 
and CKD". Here we identified a highly significant association of this 
locus with mannose:glucose ratios. The fasting level of mannose is 
lower in carriers of the risk allele, as opposed to glucose being higher. 
Notably, we also observed a 3.3% increase in lactate concentration per 
copy of the risk allele at the same locus. Little is known about the 
physiological role of mannose, other than its use in protein glycosyla- 
tion. Mannose enters the cell via a specific transporter that is insensi- 
tive to glucose”, and hepatic glycogen breakdown is implicated in the 
maintenance of plasma mannose concentrations’*. These observa- 
tions and the association with GCKR observed here, which is even 
stronger than that of glucose with GCKR, indicate a need for further 
investigation of the role of mannose as a differential biomarker, or 
even as a point of intervention in diabetes care. 


Venous thromboembolism 


With the mass-spectrometry method used here, different forms of the 
abundant fibrinogen A-« peptides can be detected. Fibrinogen has a 
role in the formation of blood clots. Its active form, the fibrinogen A-x 
chain ADSGEGDFXAEGGGVPR, can be phosphorylated at serine 3 to 
ADpSGEGDFXAEGGGVR". The ratio between the concentrations of 
these fibrinogen A-« peptides provides a measure for fibrinogen A-o 
phosphorylation (FAaP). Increased levels of FAxP have been observed 
under various physiological and pathophysiological conditions’’. Here, 
three loci (ABO, ALPL and FUT2) associated with FAuP. Notably, these 
three genes are functionally linked: ABO (encoding ABO blood group 
(transferase A, o-1-3-N-acetylgalactosaminyltransferase; transferase B, 
a.-1-3-galactosyltransferase)) and FUT2 (encoding fucosyltransferase 2) 
are involved in determining the blood group, and the ABO locus is 
associated with blood levels of the alkaline phosphatase ALPL'®. The 
association of ALPL with FAuP may be explained either by a genotype- 
dependent dephosphorylation of fibrinogen by ALPL, or by a genotype- 
dependent change in the phosphorus pool available for FAxP. Variants 
in the ABO gene are associated with many different outcomes, includ- 
ing venous thromboembolism (VTE)””. The association of ABO with 
FAaP, and thus with modified blood coagulation properties, provides a 
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Table 1 | Thirty-seven loci that displayed genome-wide significance in the meta-analysis 


Locus & Metabolic trait P value Relationship between gene function and the Biomedical and pharmaceutical interest 
SNP id associated metabolic traits 
ACADS Butyrylcarnitine/propionylcarnitine <4A4 x 10°%° Butyrylcarnitine* and propionylcarnitine* ACADS is a key enzyme in 
rs2066938 are substrates/products of ACADS mitochondrial fatty acid B-oxidation 
NAT8 N-acetylornithine 54x10 °°? N-acetyltransferase function of NAT8 Association with glomerular filtration 
rs13391552 matches the associating metabolite and 
N-acetylornithine* CKD; association of N-acetylornithine* 
with eGFR in this study 
FADS1 1-arachidonoylglycerophosphoethanolamine/. 8.5 x 10°1!® FADS1 substrate/product pair ratio Association with LDL cholesterol, HDL 
rs174547 1-linoleoylglycerophosphoethanolamine arachidonate (20:4n6) */dihomo- cholesterol and triglycerides, fasting 
linolenate (20:3n3 or n6)* isamong the glucose and homeostatic model 
top associations assessment B (HOMA-B) Crohn’s 
disease and resting heart rate 
UGT1A Bilirubin (E,E)/oleoylcarnitine 2.9 x 10°74 Bilirubin® is a substrate of UGT1A1 Association with hyperbilirubinaemia; 
1s887829 low serum concentrations of bilirubin 
associate with increased risk of CAD; a 
SNP in UGT1A1 is a pharmacogenetic 
risk factor for irinotecan toxicity 
ACADM Hexanoylcarnitine/oleate (18:1n9) 2.2x1077! Hexanoylcarnitine* is a substrate of ACADNM is a key enzyme in 
rs211718 ACADM mitochondrial fatty acid B-oxidation 
OPLAH 5-oxoproline 1.5 x 10°59 5-oxoproline* is a substrate of 
rs6558295 5-oxoprolinase OPLAH 
SCD Myristate (14:0)/myristoleate (14:1n5) 2.9x10°°’ SCD catalyses the A-9-desaturation of Palmitoleate (16:1n7) is a lipokine 
rs603424 fatty acids, such as myristate (14:0)* to linking adipose tissue to systemic 
myristoleate (14:1n5)* metabolism 
and palmitate (16:0)* to palmitoleate 
(16:1n7)* 
GCKR Glucose/mannose 55x10 5% GCKR has arole in glucose homeostasis; Association with type 2 diabetes, 
rs780094 strong association with mannose* to fasting glucose, fasting insulin; serum 
glucose* ratios matches the gene’s uric acid; triglyceride levels; 
function C-reactive protein; serum creatinine 
(eGFRcrea), Crohn’s disease and 
hypertriglyceridaemia 
NAT2 1-methylxanthine/4-acetamidobutanoate 1.7x10°4° = 4-acetamidobutanoate*, Association with triglyceride levels and 
rs1495743 1-methylxanthine* and 1-methylurate* | CAD; bladder cancer and toxicities to 
are linked to NAT2 in the xenobiotics docetaxel and thalidomide treatment 
pathways 
CYP3A4 Androsterone sulphate 8.7x10-7° —CYP3A cytochrome P450 proteins Genetic variation in androsterone 
rs17277546 metabolise androsterone sulphate * metabolism is linked to the incidence of 
prostate cancer 
ABO ADpSGEGDFXAEGGGVR/ADSGEGDFXAEGGGVR 9.1 x 104° Polymorphisms in ABO determine the Association with blood alkaline 
rs612169 blood group; association with fibrinogen phosphatase level; pancreatic cancer; 
peptide phosphorylation’; additive effect venous thromboembolism and 
on fibrinogen A-« phosphorylation phytosterol levels 
together with FUT2 and ALPL 
SLC2A9 Urate 5.5 x 10°-3* — SLC2A9 (GLUT) transports uric acid* Association with gout; several SNPs in 
184481233 SLC2A9 associate with etoposide IC5o 
CYP4A 10-nonadecenoate (19:1n9)/10-undecenoate 5.1 103% = Cytochrome P450, family 4, subfamily A, Possible role in the etiology of hepatic 
rs9332998 =(11:1n1) are fatty acid w-hydroxylases; steatosis, in interaction with stearoyl- 
10-undecenoate (11:1n1)* is coenzyme A desaturase 1 
biochemically related to w-hydroxylated 
C10 fatty acids 
CPS1 Glycine 1.6 x10°*”? Association with glycine* and creatine’; | Metabolomics data indicates that this 
rs2216405 creatine is produced from glycine; glycine association is related to a perturbed 
is metabolically related to carbamoyl ammonia metabolism 
phosphate, which is the product of CPS1 
and the entry point of ammonia into the 
urea cycle 
LACTB Succinylcarnitine 72x10’ — Association with succinylcarnitine*; LACTB transgenic mice are obese 
12652822 perturbed hepatic gene expression in 
transgenic LACTB mice indicates a role of 
LACTB in the butanoate/succinate* 
pathway 
SLC22A1 lsobutyrylcarnitine 7.3.x10°2° — SLC22A1 (OCT1) translocates a broad Genetic variations in the SLC22A1/ 
rs662138 array of organic cations; possibly also SLC22A2 region are determinants of 
isobutyrylcarnitine* or related metabolites metformin pharmacokinetics 
SLCO1B1 Eicosenoate (20:1n9 or 11)/tetradecanedioate 2.81022 SLCO1B1 (OATP2, OATP-C) is an organic Common variants in SLCO1B1 are 
rs4149081 anion transporter strongly associated with an increased 
risk of statin-induced myopathy 
FUT2 ADpSGEGDFXAEGGGVR/ 43 x10°7° FUT2 is involved in the creation of a Association with vitamin B12 levels, 
rs503279 ADSGEGDFXAEGGGVR precursor of an H antigen, and has an total cholesterol and Crohn’s disease; 
additive effect on fibrinogen A-x vitamin B12 deficiency is associated 
phosphorylation together with ABO and with cognitive decline, cancer and CAD 
ALPL 
ACE Aspartylphenylalanine 8.2x10-°° Angiotensin | converting enzyme Association with angiotensin- 
rs4329 (peptidyl-dipeptidase A) 1 is associated converting enzyme activity; potential 


with the dipeptide aspartylphenylalanine* 


genetic interaction with KLKB1 locus 
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Locus & Metabolic trait P value Relationship between gene function and the Biomedical and pharmaceutical interest 
SNP id associated metabolic traits 
PHGDH Serine 2.6 x 10°14 PHGDH catalyses the first and 
rs477992 rate-limiting step in the phosphorylated 
pathway of serine* biosynthesis 
ENPEP ADpSGEGDFXAEGGGVR/DSGEGDFXAEGGGVR 6.5 x10 1? ENPEP (APA, aminopeptidase A) is an ENPEP has a role in the catabolic 
rs2087160 amino-terminal amino peptidase; pathway of the renin-angiotensin 
association with ratios between system, and regulates blood pressure; 
fibrinogen A-« peptide association with blood pressure in 
ADSGEGDFXAEGGGVR* and its Asian population 
N-terminal cleaved form 
DSGEGDFXAEGGGVR* indicate that 
fibrinogen is a substrate of ENPEP 
AKR1C Androsterone sulphate/epiandrosterone 6.7 x 10°18 — AKRIC isoforms have a role in androgen* AKR1C has a role in the etiology of 
rs2518049 sulphate metabolism cancers including prostate, brain, 
breast, bladder and leukaemia; 
potential target of jasmonates in cancer 
cells 
NT5E Inosine 7.4 x10718 Inosine* is a substrate of the 5’- NT5E is involved in purine salvage 
rs494562 nucleotidase NT5E 
PRODH Proline 2.0 x 10° ~PRODH catalyses the first step in proline* 
rs2023634 degradation 
HPS5 a-hydroxyisovalerate 1.0x10°*° — a-hydroxyisovalerate* is found in urine of Melatonin homeostasis is deranged in 
rs2403254 patients with phenylketonuria; patients with loss of HPS genes 
phenylalanine is required for melatonin (albinism) 
biosynthesis 
ALPL ADpSGEGDFXAEGGGVR/DSGEGDFXAEGGGVR 2.910 °°  ALPLisa phosphatase and associates 
rs10799701 with A-o fibrinogen phosphorylation*; it 
has an additive effect on fibrinogen A-a 
phosphorylation together with ABO and 
of FUT2 
SLC7A6 Glutaroyl carnitine/lysine 98x10 1% — Glutaryl-CoA* is an intermediate in the Deficiencies in glutaryl-CoA 
rs6499165 metabolism of lysine* and tryptophan dehydrogenase are linked to metabolic 
disorders 
KLKB1 Bradykinin, des-arg(9) 6.6 x10 18 Kallikrein B, plasma (Fletcher factor) 1; Association of bradykinin* with 
rs4253252 kallikrein-kininogen complex binds to hypertension confirmed in this study; 
cell surface receptors leading to the potential genetic interaction with ACE 
targeted action of bradykinin* locus 
GLS2 Glutamine 3.1x1071” ~~ GLS2 catalyses the hydrolysis of 
rs2657879 glutamine* 
PDXDC1 1-eicosatrienoylglycerophosphocholine/ 45x10 !© Association with the 1-eicosadienoyl- to Association with body height 
rs7200543 —_1-linoleoylglycerophosphocholine 1-eicosatrienoyl-glycerophosphocholine* 
ratio indicates a role of PDXDC1 in the 
metabolism of C20:2 and C20:3 fatty 
acids 
SLC22A4 lsovalerylcarnitine 74x10°1'® —SLC22A4 (OCTN1) transports Association with body height 
rs272889 isovalerylcarnitine* 
AHR Caffeine/quinate 48x10°'5 AHRisa transcription factor for CYP1A1, 
rs12670403 which metabolises caffeine* 
ETFDH Decanoylcarnitine 5.5 x 10-18 Decanoylcarnitine* is used for energy ETFDH is a key enzyme in 
rs8396 production via B-oxidation to the electron mitochondrial fatty acid B-oxidation 
transfer complex 
ELOVL2 Docosahexaenoate 1.7x10°'4 ~— EPA (20:5n3)* is a substrate of ELOVL2; 
rs9393903 (DHA; 22:6n3)/eicosapentaenoate DHA (22:6n3)* is related to its product by 
(EPA; 20:5n3) a single desaturation reaction 
SLC16A9 Carnitine 3.410714 — SLC16A9 (MCT9) transports free 
rs7094971 carnitine * (Shown in this paper) 
IVD 3-(4-hydroxyphenyl)lactate/isovalerylcarnitine 1.1 x 10-18 Isovalerylcarnitine* is a transport form of IVD is a key enzyme in mitochondrial 
rs10518693 isovalerate, which is the substrate of fatty acid B-oxidation 
isovaleryl coenzyme A dehydrogenase 
(IVD) 
SLC16A10 _ Isoleucine/tyrosine 1.4x10°!% —SLC16A10 encodes the T-type amino 
rs7760535 acid transporter-1 (TAT1); this transporter 
transports tyrosine* and phenylalanine* 
The metabolic trait with the strongest association at the discovery stage in both studies is reported, together with the SNP identifier and the P value of association from the meta-analysis. Full association data are 
available in Supplementary Tables 3 & 5 and at http://www.gwas.eu. The loci are labelled by the gene that is considered most likely to carry the causative SNP. Where the metabolic trait is consistent with a nearby 


gene’s function, details are provided in the column labelled ‘Relationship between gene function and the associated metabolic traits’. Overlaps with associations from other GWAS studies are highlighted in bold 


(R? > 0.8, details are in Supplementary Table 6). *, Metabolic traits that are associated with the SNP at the corresponding locus. Further information and full bibliographic references are presented in 


Supplementary Table 4. 


functional explanation for the reported association of ABO with VTE 
risk. Moreover, if FAxP is at the basis of VTE, then FUT2 and ALPL 
should also be investigated as VTE risk genes. This hypothesis can now 
be tested in the respective patient groups. 


Coronary artery disease 


We have shown previously* that strong associations with metabolic 
traits, derived from GWAS, can point to interesting associations with 


clinical endpoints that would not otherwise be considered relevant. A 
recent meta-analysis with lipid traits”, using a similar strategy, iden- 
tified several genetic loci that were found to affect the risk of coronary 
artery disease (CAD) in the CARDIoGRAM study”. Six of these loci 
are also reported here (ABO, NAT2, CPS1, NAT8, ALPL and KLKB1), 
although some of them showed only weak evidence for association 
with CAD (P<0.01) in the CARDIoGRAM study (Supplemen- 
tary Table 8). Although the links are not statistically strong, the 
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biochemical function of the associated metabolic traits identified here 
may support a possible role in heart disease. For example, NAT8 may 
be linked to CKD via ornithine acetylation (see above). KLKB1, encod- 
ing kallikrein B plasma (Fletcher factor) 1, controls blood pressure via 
the bradykinin pathway. In this study, a genetic variant in KLKB1 was 
associated with bradykinin concentrations and we also confirmed the 
expected directional association of bradykinin with hypertension in 
both our studies (Pxora = 1.7 X 107°, Prwinsux = 0.0495, with the 
covariates age and gender). ABO and ALPL associated with FAaP, 
and we therefore speculate that genetically determined differences in 
FAP and resulting blood-coagulation properties may be the basis of 
these associations with CAD. Furthermore, our associations indicate 
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Figure 1 | Genetic basis of human metabolic individuality and its overlap 
with loci of biomedical and pharmaceutical interest. More than 100 years 
ago, it was realised that inborn errors in human metabolism were ‘merely 
extreme examples of variations of chemical behaviour which are probably 
everywhere present in minor degrees’ and that this “chemical individuality 
(confers) predisposition to and immunities from the various mishaps which are 
spoken of as diseases”. The 37 genetically determined metabotypes (GDMs) 
that we have reported here explain a highly relevant amount of the total 
variation in the studied population and therefore contribute substantially to the 
genetic part of human metabolic individuality. a, GDMs are shown colour- 
coded by general metabolic pathways, together with selected associated 
metabolic traits, highlighting the relationship between gene function and the 
associated metabolic trait (see column 4 in Table 1). b, GDMs colour-coded by 
overlap with associations in previous GWAS with disease (red), intermediate 
risk factors for disease (yellow) and other traits (green). Locus overlap is defined 
here by the lead SNP reported in the national human genome research institute 
(NHGRI) GWAS catalogue being highly correlated (R” = 0.8) with the most 
associated SNP in the metabolomics scan (see column 5 in Table 1 and 
Supplementary Table 7). Note that the overlap between the metabolomics loci 
and the loci reported by the NHGRI GWAS catalogue is highly significant when 
compared to a draw of 37 randomly selected SNPs with similar properties 
(P<3X 10 °, see Methods). 


that the role of FAxP as a biomarker for acute myocardial infarction, 
and the combined additive genetic effect of the ABO, ALPL and FUT2 
loci (Supplementary Fig. 4) on CAD risk, should be investigated in 
greater detail. 


New biological and functional insights 

Genome-wide association studies merely uncover statistically sig- 
nificant associations, and can therefore only generate biological 
hypotheses. Although providing experimental validation of all asso- 
ciations is beyond the scope of a single study, we nevertheless 
attempted to show that, in principle, validation is possible. The asso- 
ciation of SNP rs7094971 in solute carrier family 16, member 9 
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Figure 2 | Experimental evidence for SLC16A9 (MCT9) as a carnitine efflux 
transporter. a, b, When 4.6 nl of fe H]-carnitine was injected into Xenopus 
oocytes, followed by incubation in medium for 90 min, efflux was significantly 
higher in oocytes expressing MCT9 than in the non-injected (NI, a) or water- 
injected (WI, b) controls. By contrast, when oocytes were incubated in medium 
containing [PH]-carnitine (4 1.Ci ml” '), there was no significant uptake, 
indicating that MCT9 does not mediate carnitine uptake (data not shown). 
Because some previously characterized monocarboxylic acid transporters are 
proton-coupled”’, the experiments were conducted at both pHout 7.4 and 
PHout 5.5, but no significant difference was observed (a and data not shown). In 
agreement with this, external unlabelled carnitine was unable to trans-stimulate 
[°H]-carnitine efflux, with no significant difference in efflux between MCT9- 
expressing oocytes in the absence or presence of 5 mM carnitine (MCT9 versus 
MCT9 + carn, b). Data are means ~ s.e.m. of 6-10 oocytes per data point from 
2 oocyte preparations. The y-axes represent remaining [°H]-carnitine levels 
(c.p.m. per oocyte). Statistical significance was determined by the Student’s 
t-test. These results are consistent with MCT9 acting as a unidirectional 
carnitine efflux system when expressed in Xenopus oocytes. Additional 
experiments are required to establish the full substrate specificity of MCT9. If 
future studies show an appropriate cellular distribution, MCT9 could be 
responsible for carnitine efflux across the basolateral membrane of absorptive 
epithelial cells, after absorption via the well-characterized SLC22 (also known 
as OCTN) family of apical epithelial proton-coupled carnitine transporters”. 
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(SLCI6A9, also known as MCT9) with carnitine indicated that this 
metabolite is the substrate of this hitherto uncharacterized monocar- 
boxylic acid transporter. We therefore tested [*H]-carnitine uptake by 
SLC16A9-expressing Xenopus oocytes. As shown in Fig. 2, our data 
show that SLC16A9 is a pH-independent carnitine efflux transporter, 
possibly responsible for carnitine efflux from absorptive epithelia into 
the blood. 

Another prominent example is the highly significant association of 
increased urate levels, and their clinical complication of gout, with 
variants in the SLC2A9 gene”’. The association between SLC2A9 var- 
iants and urate levels was also observed here. Although it was previ- 
ously annotated as a glucose transporter, SLC2A9 was later shown” to 
encode a high-capacity urate transporter. Similar characterization 
experiments by specialists in the related fields should be motivated 
and guided by our association data. Among the 37 GDMs reported 
here, we suggest that the associations with coarsely-characterized 
genes for enzymes and transporters that are known disease-risk loci 
may warrant further experimental investigation, for instance in 
experiments using isotope-labelled derivatives of the associated meta- 
bolites reported here as putative target substrates. We deem NATS8 to 
be a prime candidate for such a study. 


Pharmacogenomics 


Using the pharmacogenomics knowledge base~*, we identified six 
GDMsas being previously associated with toxicity or adverse reactions 
to medication. Noteworthy are polymorphisms in the NAT2 and 
CYP4A loci that associated with toxicities to docetaxel and thalidomide 
treatment”; the UGTIA locus with irinotecan toxicity*®, SLC2A9 with 
the ICs, of etoposide’, SLC22A1 with metformin pharmacoki- 
netics*”’ and SLCO1B1 with statin-induced myopathy”. In all cases, 
our associations with metabolic traits at these loci provide a possible 
novel biochemical basis for the genotype-dependent reaction to drug 
treatment, such as the association of SLCO1B1 with a series of fatty 
acids, including tetradecanedioate and hexadecanedioate. This 
information can be used to support the redesign of the respective drug 
molecules to avoid adverse reactions. Moreover, systematic inclusion 
of biochemically relevant GDMs as candidate SNPs during drug trials 
may permit early identification of potentially adverse pharmacogenetic 
effects. This applies specifically to AKRIC, which is a novel target of 
jasmonates in cancer cells*’. We reported a GDM associated with 
AKRIC which has a large effect-size on androgen metabolism. The 
influence of SNP rs2518049 in AKRIC on the efficiency and potential 
side effects of jasmonates should therefore be assessed in future clinical 
trials. 


Discussion 


Owing to their large effect-size and high explained variance, the 37 
GDMs reported in this study indicate key genetic loci underpinning 
differences in human metabolism. Inclusion of these genetic variants 
in the statistical analysis of pre-clinical and clinical studies may facil- 
itate identification of genotype-dependent outcomes, such as disease 
complications and adverse drug reactions. In two cases, we could 
establish a direct functional link, supported by both our studies, 
between a genetic variant, an intermediate metabolic trait and a dis- 
ease-relevant endpoint: KLKB1 with bradykinin and hypertension, 
and NATS8 with N-acetylornithine and eGFR. We note that by dis- 
cussing only associations that are supported by two independent 
studies at genome-wide significance, we have chosen to take a very 
conservative approach. On the basis of QQ-plots and coarse assump- 
tions, we estimate that more than 500 loci with signals of association 
below that conservative threshold may be confirmed as GDMs in 
more highly powered studies in the future. Technically, it is of note 
that by using a single study to profile 2,820 individuals metabolically, 
using only 100 pl of blood serum, we replicated in this study a wide 
series of findings from previous large GWAS with quantitative traits, 
including serum levels of fasting glucose’, bilirubin**”’, urate** and 
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dehydroisoandrosterone sulphate**. Our study shows how GWAS 
with intermediate traits that are close to the underlying biological 
processes can provide new functional insights into associations from 
GWAS with the endpoints of complex chronic disease and drug 
toxicity. Future GWAS that combine multiple ‘omics’ technologies 
in a single study, including transcriptomics, proteomics, metabolo- 
mics and recent technologies for determining epigenetic modifica- 
tions on a genome-wide scale, are likely to be the next big step 
towards a full understanding of the interaction between genetic pre- 
dispositions and environmental factors in the development of com- 
plex chronic diseases, their diagnosis, prevention and safe and 
efficient therapy. 


METHODS SUMMARY 


The full Methods section provides information about study design, genetic and 
metabolic data collection, and data analysis. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Study populations. The KORA S4 survey, an independent population-based 
sample from the general population living in the region of Augsburg, southern 
Germany, was conducted in 1999-2001. The study design and standardized 
examinations of the survey (4,261 participants, response 67%) have been 
described in detail (ref 39 and references therein). A total of 3,080 subjects 
participated in a follow-up examination, KORA F4, in 2006—2008, comprising 
individuals who, at that time, were aged 32-81 years. The TwinsUK cohort is a 
British adult-twin registry in the age range 8-102 years and 84% are female. The 
samples used in this study are aged 23-85 (mean age 48 years) and 97% are female. 
These unselected twins were recruited from the general population through 
national media campaigns and were shown to be comparable to age-matched 
population singletons in terms of disease-related and lifestyle characteristics”. 
In both studies, written informed consent has been given by all participants and 
the studies have been approved by the local ethics committees (Bayerische 
Landesiarztekammer for KORA and Guy’s and St. Thomas’ Hospital Ethics 
Committee for TwinsUK). 

Blood sampling. Blood samples for metabolic analysis and DNA extraction from 
KORA were collected between 2006 and 2008 as part of the KORA F4 follow-up. 
To avoid variation due to circadian rhythm, blood was drawn in the morning 
between 08:00 and 10:30 after a period of at least 10 h overnight fasting. Material 
was drawn into serum gel tubes, gently inverted twice and then allowed to rest for 
30 min at room temperature (18—25°C) to obtain complete coagulation. The 
material was then centrifuged for 10 min (2,750g at 15°C). Serum was divided 
into aliquots and kept for a maximum of 6h at 4 °C, after which it was frozen at 
—80 °C until analysis. For the TwinsUK study, blood samples were taken after at 
least 6h of fasting. The samples were immediately inverted three times, followed 
by 40 min of resting at 4°C to obtain complete coagulation. The samples were 
then centrifuged for 10 min at 2,000g. Serum was removed from the centrifuged 
brown-topped tubes as the top, yellow, translucent layer of liquid. Four aliquots of 
1.5 ml were placed into skirted microcentrifuge tubes and then stored at —45 °C 
until sampling. 

Metabolomics measurements. Metabolon, a commercial supplier of metabolic 
analyses, developed a platform that integrates the chemical analysis, including 
identification and relative quantification, data-reduction and quality-assurance 
components of the process. The analytical platform incorporates two separate 
ultrahigh-performance liquid chromatography/tandem mass spectrometry 
(UHPLC/MS/MS2) injections and one gas chromatography/mass spectrometry 
(GC/MS) injection per sample. The UHPLC injections were optimized for basic 
and acidic species. A total of 295 metabolites were measured, spanning several 
relevant classes (amino acids, acylcarnitines, sphingomyelins, glycerophospholipids, 
carbohydrates, vitamins, lipids, nucleotides, peptides, xenobiotics and steroids; a full 
list of metabolites is given in Supplementary Table 1). The detection of the entire 
panel was carried out with 24 min of instrument analysis time (two injections at 
12 min each), while maintaining low median process variability (<12% across all 
compounds). The resulting MS/MS" data were searched against a standard library 
generated by Metabolon that included retention time, molecular mass to charge ratio 
(m/z), preferred adducts and in-source fragments as well as their associated MS/MS 
spectra for all molecules in the library. The library allowed for the identification of 
the experimentally detected molecules on the basis of a multiparameter match 
without the need for additional analyses. Metabolon has shown in a recent publica- 
tion that their integrated platform enabled the high-throughput collection and 
relative quantitative analysis of analytical data and identified a large number and 
broad spectrum of molecules with a high degree of confidence’. The Metabolon 
platform has, among other studies, been successfully applied in the analysis of the 
adult human plasma metabolome*! and the identification of sarcosine as a bio- 
marker for prostate cancer“. 

Quality control of metabolomics data. For this study we measured the 
Metabolon panel in human blood from 1,768 individuals of the KORA cohort 
and in 1,052 individuals of the TwinsUK cohort. Quality control data (relative 
standard deviation, upper and lower 95% confidence interval and minimum and 
maximum observed values in quality control samples) are reported in Sup- 
plementary Table 1. To avoid spurious false-positive associations due to small 
sample sizes, only metabolic traits with at least 300 non-missing values were 
included, and data points of metabolic traits that lay more than three standard 
deviations off the mean were excluded by setting them to ‘missing’ in the analysis: 
276 of 295 available metabolites and 37,179 metabolite ratios satisfied this 
criterion in KORA, resulting in a total of 37,455 metabolic traits. For the 
TwinsUK study, identical selection criteria for metabolic traits were used, result- 
ing in 258 metabolites and 32,499 metabolite ratios, and a total of 32,757 
metabolic traits. 

Genotyping and imputation. For all individuals profiled from the KORA study, 
genome-wide SNP data were already available. GWAS data of KORA and 
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TwinsUK have been used and described extensively in the past, in the context 
of numerous GWAS and meta-analyses*****. We therefore summarize only the 
essential details here. Genotyping of the KORA F4 population was carried out 
using the Affymetrix GeneChip array 6.0. Genotypes were determined using the 
Birdseed2 clustering algorithm. For quality assurance, we applied the criteria of 
call rate >95% and P(Hardy-Weinberg) > 10° as filters for SNP quality: 
655,658 autosomal SNPs satisfied these criteria. These genotyped SNPs were used 
for genome-wide analysis of the metabolic traits. For selection of the best-assoc- 
iated SNP within a region in a meta-analysis of KORA and TwinsUK, we used 
genotyped SNPs as well as dosages of imputed SNPs. In KORA F4, imputation 
was done using IMPUTE v0.4.2 (ref. 44) based on HapMap? (see below). 

Genotyping of the TwinsUK data set was done with a combination of Illumina 
arrays (HumanHap300, HumanHap610Q, 1M-Duo and 1.2MDuo 1M)**"°. We 
pooled the normalized intensity data for each of the three arrays separately (with 
1M-Duo and 1.2MDuo 1M pooled together). For each data set, we used the 
Illluminus calling algorithm’ to assign genotypes in the pooled data. No calls 
were assigned if an individual’s most likely genotype was called with a posterior 
probability threshold of <0.95. Validation of pooling was achieved via a visual 
inspection of 100 random, shared SNPs for overt batch effects. Finally, intensity 
cluster plots of significant SNPs were visually inspected for over-dispersion- 
biased no calling, and/or erroneous genotype assignment. SNPs showing any of 
these characteristics were discarded. 

We applied similar exclusion criteria to each of the three data sets separately. 
Exclusion criteria for samples were: (1) sample call rate <98%; (2) heterozygosity 
across all SNPs = 2 s.d. from the sample mean; (3) evidence of non-European 
ancestry as assessed by principle-component-analysis comparison with HapMap3 
populations; (4) observed pairwise identity-by-descent (IBD) probabilities indicative 
of sample identity errors. We corrected misclassified monozygotic and dizygotic 
twins on the basis of IBD probabilities. Exclusion criteria for SNPs were: (1) Hardy- 
Weinberg P value <10~°, assessed in a set of unrelated samples; (2) minor allele 
frequency (MAF) <1%, assessed in a set of unrelated samples; (3) SNP call rate 
<97% (SNPs with MAF = 5%) or <99% (for 1% < MAF <5%). 

Alleles of all three data sets were aligned to HapMap2 or HapMap3 forward- 

strand alleles. Before merging, we performed pairwise comparison among the 
three data sets and further excluded SNPs and samples to avoid spurious geno- 
typing effects, identified as follows: (1) concordance at duplicate samples <1%; 
(2) concordance at duplicate SNPs <1%; (3) visual inspection of QQ-plots for 
logistic regression applied to all pairwise data-set comparisons; (4) Hardy- 
Weinberg P value <10~°, assessed in a set of unrelated samples; (5) observed 
pairwise IBD probabilities indicative of sample identity errors. We then merged the 
three data sets, keeping individuals typed at the largest number of SNPs when an 
individual was typed at two different arrays. The merged data set consists of 5,654 
individuals (2,040 from the HumanHap300, 3,461 from the HumanHap610Q and 
153 from the HumanHap1M and 1.M arrays), and up to 874,733 SNPs depending on 
the data set (HumanHap300: 303,940, HumanHap610Q: 553,487, HumanHap1M 
and 1.M: 874,733). Imputation was performed using the IMPUTE software 
package (v2)"*, using two reference panels, PO (HapMap2, rel 22, combined 
CEU+YRI+ASN panels) and Pl (610k+, including the combined 
HumanHap610k and 1M reduced to 610k SNP content). The analysis of this study 
used 534,665 autosomal SNPs (basically 610K SNPs extracted from the final 
merged data set). 
Statistical analyses. The primary association testing was carried out using linear 
regressions on all metabolite concentrations and all possible ratios of metabolite 
concentrations. This was motivated by our previous observation** that the use of 
ratios may lead to a strong reduction in the overall trait variance. A test of 
normality showed that in 29,338 cases, the log-transformed ratio distribution 
was significantly better represented by a normal distribution than when untrans- 
formed ratios were used. In 5,145 cases, the untransformed distribution was 
closer to a normal distribution. For concentrations, 149 were closer to a log- 
normal distribution and 124 were better represented by a normal distribution. 
On the basis of this observation, and also for sake of simplicity, we decided to log- 
transform all metabolites and their ratios. We used P-gain statistics** to quantify 
the decrease in P value for the association with the ratio compared to the P values 
of the two corresponding concentrations. A high P-gain (more than 250) indi- 
cates that two metabolites are likely to be functionally linked in a metabolic 
pathway that has an impact on the associating genotype KORA and TwinsUK 
are population-based studies. They comprise only individuals who are not dis- 
playing any severe clinical symptoms at the time of sampling. Therefore, disease 
state was not considered as a confounding factor in the statistical analysis. In 
KORA, the software PLINK (version 1.06)** and SNPTEST was used with age and 
gender as covariates. To account for the family structure in the TwinsUK study, 
we used variance components applied to a score test implemented in the software 
Merlin”. 
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Correction for multiple testing. We applied a conservative Bonferroni correction 
to control for false-positive error rates deriving from multiple testing. Using the 
KORA study as a reference, we corrected for tests on 655,658 SNPs and 37,455 
metabolic traits, thus obtaining a Bonferroni-adjusted P value of P = 2.04 X 10 iA 
For ratios, we also required that the increase in the strength of association, 
expressed as the change in P value when using ratios compared to the larger of 
the two P values when using two metabolite concentrations individually (P-gain), 
should be larger than the number of tested metabolic traits (P-gain > 250)**. This 
limit is considered a Bonferroni-type conservative cutoff for identifying those 
metabolite concentration pairs for which the use of ratios strongly improves the 
strength of association. In addition to the strongest associating metabolic trait, 
others can often provide additional insight into the underlying biochemical pro- 
cesses. In such cases, we consider a P value of 1.33 X 10 © to represent a conser- 
vative level of significance (Bonferroni correction for 37,455 tests at a nominal 
significance level of 5%). 

Inflation. In most cases, the assumption of a linear additive model was valid (see 
box plots in Supplementary Fig. 3) and there was no inflation of summary statistics, 
which could be indicative of population stratification (see QQ-plots in 
Supplementary Fig. 3). Lambda values ranged from 0.965 to 1.024 (median = 
1.006) in KORA, and from 0.940 to 1.013 (median = 0.985) in TwinsUK. 
Candidate gene selection and overlap with disease loci. Regional association 
plots (Supplementary Fig. 3) were created using imputed and meta-analysed data. 
Within this region, the SNP with the strongest signal of association in the meta- 
analysis was retained as the final SNP to be reported. Association data for all 
metabolic traits at the 37 SNPs reported in Table 1 (for KORA, TwinsUK and 
meta-analysis), limited to associations with P < 1.33 x 10 ° (Bonferroni correc- 
tion for multiple testing of metabolic traits at a single locus) and P-gain > 250 (for 
ratios) in the meta-analysis, are reported in Supplementary Table 4. For the 
strongest associating trait, box plots were plotted to visualize the actual quant- 
itative dependence of the trait on genotype (Supplementary Fig. 3). On the basis of 
association data alone, it is not possible in most cases to identify the gene within a 
locus that causes the association. However, using knowledge of the function of 
genes in LD with the reported SNP, as well as the biochemical characteristics of 
the associating metabolite, it is possible to identify a single most likely candidate 
gene in many cases. These cases are tagged as ‘match between gene function and 
metabolic trait’ and are supported by arguments provided as Supplementary Text 
(for example, the association between a SNP in LD with OPLAH (oxoprolinase) 
and oxoproline concentrations). At two loci (CYP4A and UGTIA), alternative 
splice variants exist. We named these loci without attempting to specify the exact 
variant. 

GWAS catalogue. Using the catalogue of published GWAS (accessed 10 October 
2010)', we identified for each entry the SNPs in the KORA and TwinsUK studies 
that correlate most strongly with a previously reported SNP (7° = 0.5) and that 
were present in our association database (P< 10 *, P-gain > 10). The resulting 
associations are available online on our GWAS server. New associations will be 
included as the database of published GWAS is updated. 

Enrichment analysis. We downloaded the actual version of the GWAS catalogue 
from NHGRI and deleted all records that correspond to our previous studies. As a 
sampling data set, we chose the 655,658 SNPs from the Affymetrix 6.0 array, 
which have been tested in the KORA part of this study. The 37 SNPs that we 
report are from this array and can therefore be considered to represent one draw- 
out of this set. We then drew 1,000,000 sets of 37 SNPs at random (with replace- 
ment) from this sampling data set. To account for comparable MAF distributions 
between the reference and the random set, we then rejected all draws in which the 
mean or the variance of the MAF distributions were significantly different 
(P < 0.05) between the random and the reference set: 330,775 random sets were 
hence retained. Using an LD criterion of r* > 0.8 (based on HapMap? release 27, 
NCBI B36, CEU population), we then counted the overlap with the GWAS 
catalogue for every random set. The reference set was included as a technical 
positive control in the computations. For the 330,775 tested random sets, at most 
six overlapping SNPs were found (8 times), and in more than half of the cases, no 
overlapping SNPs were present in the sampled data set (see Supplementary Table 
9). For our reported 37 metabolomics SNPs, we identified 14 overlapping SNPs 
(note that we report 15 overlapping loci in Fig. 1; the ENPEP locus was not yet 


included in the GWAS catalogue and was not used in this analysis). Because we 
never found 14 overlapping loci by chance, the P value of our observations being 
due to chance is less than 1/330,775 ~ 3 X 10°. 

Functional characterization of SLC16A9. The SLC16A9 (MCT9) clone (IMAGE 
ID 40146598) was purchased from Autogen Bioclear. Plasmid was linearized with 
Spel restriction enzyme (New England Biolabs) and complementary RNA was 
synthesized in vitro using the T7 mMachine in vitro transcription system 
(Ambion). MCT9 was expressed in Xenopus laevis oocytes as described previ- 
ously”. Briefly, oocytes at stage V-VI were injected with 10 ng of MCT9 cRNA and 
incubated in modified Barth’s solution for 3-4 days at 18°C with the medium 
changed daily. Control oocytes had either no injection or an injection of an equal 
volume (50 nl) of distilled water, and were incubated for the same length of time. 
Uptake and efflux experiments were performed similarly to those described previ- 
ously® except that the substrate was [*H]-carnitine (specific activity 81 Cimmol ', 
GE Healthcare). 

Data access. This study generated millions of individual data points through the 
profiling of n metabolites (n = 250) and n(n — 1)/2 ratios in about 2,800 indivi- 
duals, and the subsequent associations with millions of genetic variants from 
GWAS. We created a web-based interface and visualization tools for the dissem- 
ination of results to the scientific community, with the aim of allowing rapid 
storage and retrieval of data as well as managing the integration of metabolomics 
summary statistics vis-a-vis published GWAS studies. The association data are 
freely available at http://www.gwas.eu and at mirror sites located at the 
Wellcome Trust Sanger Institute and King’s College London sites. 

Web links. GWAS server: http://www.gwas.eu, SNAP: http://www. broadinstitute. 
org/mpg/snap/, NHGRI catalogue of published GWAS: http://www.genome.gov/ 
gwastudies/, eQTL: http://www.sanger.ac.uk/Software/analysis/genevar/, GRAIL: 
http://www.broadinstitute.org/mpg/grail/, IPA (Ingenuity Pathway Analysis): 
http://ingenuity.com, OMIM: http://www.ncbi.nlm.nih.gov/omim, yED network 
editor: http://www.yworks.com, BioGPS: http://biogps.gnf.org, Genecards: http:// 
www.genecards.org, WikiGenes: http://www.wikigenes.org, Pharmacogenomics 
knowledge base: http://www.pharmgkb.org, R statistical analysis system: http:// 
www.t-project.org, KORA study population: http://www.helmholtz-muenchen.de/ 
kora/, TwinsUK study: http://www.twinsuk.ac.uk, Metabolon Inc.: http://www. 
metabolon.com, MERLIN: http://www.sph.umich.edu/csg/abecasis/Merlin, PLINK: 
http://pngu.mgh.harvard.edu/~purcell/plink, R: http://www.r-project.org, SNPTEST: 
http://www.stats.ox.ac.uk/~ marchini/software/gwas/snptest.html 
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The mechanism of membrane- associated 
steps in tail-anchored protein insertion 


Malaiyalam Mariappan'*, Agnieszka Mateja**, Malgorzata Dobosz’, Elia Bove”, Ramanujan S. Hegde'+ & Robert J. Keenan? 


Tail-anchored (TA) membrane proteins destined for the endoplasmic reticulum are chaperoned by cytosolic targeting 
factors that deliver them to a membrane receptor for insertion. Although a basic framework for TA protein recognition is 
now emerging, the decisive targeting and membrane insertion steps are not understood. Here we reconstitute the TA 
protein insertion cycle with purified components, present crystal structures of key complexes between these 
components and perform mutational analyses based on the structures. We show that a committed targeting complex, 
formed by a TA protein bound to the chaperone ATPase Get3, is initially recruited to the membrane through an 
interaction with Get2. Once the targeting complex has been recruited, Getl interacts with Get3 to drive TA protein 
release in an ATPase-dependent reaction. After releasing its TA protein cargo, the now-vacant Get3 recycles back to the 
cytosol concomitant with ATP binding. This work provides a detailed structural and mechanistic framework for the 


minimal TA protein insertion cycle. 


Approximately 5% of eukaryotic membrane proteins are anchored to 
the lipid bilayer by a single carboxy-terminal transmembrane domain’* 
(TMD). These ‘tail-anchored’ proteins are found in virtually all cellular 
membranes and perform essential functions in processes including 
protein trafficking, degradation, cell death and membrane biogenesis. 
TA proteins in compartments of the secretory and endocytic pathways 
are first targeted to and inserted into the ER membrane by a post- 
translational targeting pathway conserved across eukaryotes”? and 
archaea’?". 

This pathway begins with a ‘pre-targeting’ factor that captures newly 
synthesized TA proteins through their TMDs near the ribosome’*”*. In 
yeast, the pre-targeting factor is Sgt2, which assembles with Get3, Get4 
and Get5 (also known as Mdy2) to form a TMD recognition com- 
plex'*’*. Assembly of TMD recognition complexes permits sub- 
strates to be transferred from Sgt2 to Get3 in an ATP-dependent 
manner’. Get3 (TRC40, or ASNAI, in mammals) is a homodimeric 
ATPase whose conformation is regulated by its nucleotide state’*°. 
Both crystallographic and functional analyses support a model in 
which an ATP-bound, ‘closed’ dimer of Get3 binds substrates in a large 
hydrophobic groove that spans both subunits'®'*”°. This substrate- 
Get3-nucleotide complex is therefore the committed targeting com- 
plex (Supplementary Discussion). 

In yeast, genetic and physical interaction studies have identified the 
ER-localized membrane proteins Get1 and Get2 as potential receptors 
for Get3 (refs 7, 21). It is not known whether Get1, Get2 and Get3 
constitute the minimal targeting and insertion machinery, how they 
function or what their essential roles are during TA protein insertion. 
In this Article, we combine functional reconstitution of TA protein 
insertion with structural analysis of key intermediate complexes to 
provide a mechanistic framework for the TA protein insertion cycle in 
Saccharomyces cerevisiae. 


The minimal insertion machinery 


We first reconstituted the TA protein insertion cycle with purified 
recombinant factors. A functional TA protein targeting complex was 


assembled and purified from in vitro translation reactions (Sup- 
plementary Fig. 1). The complex contained radio-labelled and 
epitope-tagged Sec61f (an ER-localized TA protein) bound to recom- 
binant yeast Get3 in roughly the 2:1 ratio expected from structural 
studies. This recombinant targeting complex was functional as 
judged by membrane insertion of Sec61f into ER-derived yeast rough 
microsomes (yRMs) but not into protein-free liposomes (Fig. 1a). 
Microsomes from AGetl and AGet2 yeast strains showed little 
insertion activity, whereas AGet3 microsomes were similar to wild- 
type yRMs. Sec61f insertion efficiency with the purified targeting 
complex was approximately two-fold higher than for Sec61B in crude 
translation reactions (data not shown), consistent with the obser- 
vation that the latter contains a heterogeneous mixture of Sec61 
complexes with other factors*’*””. Thus, purified Get3-Sec61 is a 
committed targeting complex for Getl- and Get2-dependent mem- 
brane insertion. 

The TA insertion defect of AGet1 and AGet2 microsomes is due 
solely to loss of Get1 and/or Get2. To show this, purified recombinant 
Getl and Get2 (rGetl and rGet2; Supplementary Fig. 2) produced 
from Escherichia coli were added to detergent extracts prepared from 
AGet1 or AGet2 yRMs, reconstituted into proteoliposomes and tested 
for function (Supplementary Fig. 3). Proteoliposomes from AGet1 
yRMs were inactive for TA protein insertion, but were restored by 
replenishment with physiologic levels of rGet1 but not rGet2. AGet2 
proteoliposomes required both rGet1 and rGet2 to restore insertion to 
near wild-type levels (Supplementary Fig. 3), as expected because 
Get1 is absent from AGet2 yRMs (Fig. 1a). We also biochemically 
depleted Getl and Get2 from wild-type yRM and showed that the 
resulting insertion defect could be corrected by replenishment with 
rGet1 and rGet2 but with neither individually (Supplementary Fig. 4). 
Thus, rGetl and rGet2 are fully functional in replacing their native 
counterparts during Get3-dependent TA protein insertion. 

The lack of membrane proteins co-purifying with Getl and Get2 
(Supplementary Fig. 5), and the absence of other membrane proteins 
found in genetic studies'*’**, suggested that Getl and Get2 are 


1Cell Biology and Metabolism Program, National Institute of Child Health and Human Development, National Institutes of Health, Room 101, Building 18T, 18 Library Drive, Bethesda, Maryland 20892, USA. 
2Department of Biochemistry & Molecular Biology, The University of Chicago, Gordon Center for Integrative Science, Room W238, 929 East 57th Street, Chicago, Illinois 60637, USA. +Present address: MRC 


Laboratory of Molecular Biology, Hills Road, Cambridge CB2 OQH, UK. 
*These authors contributed equally to this work. 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 61 


©2011 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


a = b 
aes rGet1/2 yRM 
yo x 3 
Poe CYS (imo) 0 5 15305080 ( 
PK: = + + + + + rGet2 7] ——— Get2 
le os 
- Lseceip —- "Get! Get1 
~ PF 
Get1 
Get2 
Get3 
Sec61a 
& 
d Sas 
2 
PEA _ KE 
FL-| acd - 
PF- 
Sec61B VAMP2 Sed5 
© 1005. waett/2 
gS 75 pF - eee rGet1/2 
Go 
= 50 PF rGett 
B25 
= pF rGet2 
0 : 
0 2 4 6 8 


05153 5 8 


Protein (nM) Protein (nM) 


Figure 1 | Reconstitution of TA protein insertion with purified 
components. a, Yeast rough microsomes (yRMs) from the indicated strains 
were tested for insertion of purified Get3-Sec61f targeting complex (top) or by 
immunoblotting (bottom). The protease-protected fragment (PF) is diagnostic of 
successful insertion. Liposomes are a negative control. PK, proteinase K; WT, 
wild type. b, Quantification of Getl and Get2 concentrations in yRMs by 
immunoblotting. c, Protein composition of yRMs and proteoliposomes 
containing recombinant proteins. Proteoliposomes in 20-fold relative excess 
were analysed. d, Insertion of purified targeting complexes into liposomes, yRMs, 
or rGet1/2 proteoliposomes. VAMP2 and Sed5, TMDs from rat VAMP? or yeast 
Sed5. Concentrations of the Get1/2 complex are indicated FL, full length. 

e, Relative efficiency of insertion of purified Get3-Sec61 targeting complex into 
rGetl, rGet2 or rGet1/2 proteoliposomes. Autoradiographs and quantified data 
are shown. 
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Figure 2 | Get2 fragment complex with ADP» AIF, -bound Get3. 

a, Predicted topology of S. cerevisiae Get2 with its large cytosolic-facing region 
(yellow). b, Structure of two Get2 fragments (yellow) bound to the closed Get3 
dimer (green, blue). Two Mg”*-ADP* AIF, complexes and a zinc atom are 
indicated (spheres). An orthogonal view into the substrate-binding composite 
hydrophobic groove is shown on the right. c, Get3 residues in the Get2 interface 
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sufficient for Get3-mediated TA protein insertion. Indeed, proteoli- 
posomes containing physiologic concentrations of only rGetl and 
rGet2 (Fig. 1b, c) were indistinguishable from yRM in mediating 
insertion of three different purified TA protein targeting complexes 
(Fig. 1d). Incorporating super-physiologic levels of rGetl and rGet2 
did not further improve insertion (Fig. 1d), and lower levels reduced 
overall insertion efficiency (Fig. le). 

The recombinant system required both rGet1 and rGet2 (Fig. le), 
precisely mirroring the results in vivo’ and in crude proteoliposomes 
(Supplementary Figs 3 and 4). Interaction analysis confirmed that 
rGetl and rGet2 form a complex through their membrane domains 
in detergent solution (Supplementary Fig. 6), suggesting that during 
reconstitution they are incorporated as a complex. Taken together, the 
dependence on rGet1 and rGet2, their interaction with each other, 
their functionality in replacing the endogenous proteins and the high- 
efficiency insertion at native concentrations argue strongly that we 
have reconstituted physiologically relevant TA protein insertion with 
a defined targeting complex and only two membrane proteins. 


The Get2c-Get3-ADP:AIF,” complex 


Membrane targeting presumably involves an interaction between Get3 
and the conserved cytosolic domains of Get1 and/or Get2 (Figs 2a and 
3a and Supplementary Fig. 10). These fragments (‘Getlc’ and ‘“Get2c’) 
did not interact with each other (Supplementary Figs 6 and 7), but both 
bound tightly to Get3 (Supplementary Figs 7 and 8) and inhibited the 
insertion of Sec61B into yRMs (Supplementary Fig. 8). The ability of 
Get3 to interact with either subunit of the Get1/2 complex suggested 
that each interaction might serve a different purpose in the insertion 
cycle. 

The closed-dimer form of ADPeAIF, -bound Get3 probably 
mimics the TA substrate-bound conformation that targets to the mem- 
brane’®"'*”°. This Get3-ADPeAIF, complex crystallized with Get2c, 
and we determined the structure to a resolution of 2.1 A (Supplemen- 
tary Table 1 and Supplementary Fig. 9). The structure reveals Get3 ina 
“closed’-dimer conformation with ADPeAIF, bound at each active 
site (Fig. 2b). Two Get2 fragments, each comprising two helices con- 
nected by a short linker, bind to equivalent sites on opposite faces of the 
symmetric Get3 homodimer. Each interface buries ~960 A? of surface 
area, largely restricted to a single Get3 monomer (Fig. 2c, green, and 


are indicated. Most contacts are to one Get3 monomer (green); poorly ordered 
contacts are to the conserved A-loop ATPase motif. d, Close-up of interactions 
along helix 1 of Get2, including Arg 17, Lys 20 and Phe 21. e, Close-up of 
interactions along helix «2 of Get2, including the conserved salt bridge between 
Arg 29 and Glu 253. 
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Supplementary Fig. 10). Get3 residues within the interface undergo 
little conformational change on binding to Get2c (Supplementary Fig. 11). 
The amino-terminal helix of Get2 lies in a cleft defined at one end by 
short loops following helices «10 and «11 of Get3, and at the other end 
by the loop following helix «9 and the extreme N terminus of Get3 
(Fig. 2d). Three conserved, negatively charged residues in Get3, namely 
Asp 265, Glu 307 and Asp 308, make direct contact with Get2c. The 
second helix of Get2 lies in a cleft defined by Get3 helices «10 and 11 
(Fig. 2e). This surface is largely hydrophobic except for a conserved salt 
bridge between Glu 253 (Get3) and Arg 29 (Get2c). The C-terminal 
end of the Get2 fragment, which is not conserved, makes poorly 
ordered contacts with the adjacent Get3 monomer (Fig. 2c, blue). 
The TA substrate-binding site in Get3 comprises a large hydrophobic 
groove spanning the «-helical subdomains of both monomers"®. In the 
Get2c-Get3 complex, this groove is intact (Fig. 2b and Supplementary 
Fig. 20), suggesting that Get2 captures the closed Get3 targeting com- 
plex without disrupting the TA binding site. The long, flexible linker 
that tethers the helical N terminus of Get2 to its first TMD would 
facilitate this process. Thus, we propose that the Get2c—Get3- 
ADPeAIF, structure represents a snapshot of the initial encounter 
between the closed-dimer targeting complex and the receptor. 


The Getlc-Get3 complex 


Get3 was also crystallized in the presence of Getlc. Whether or not 
ADPeAIF, was present during crystallization, the Get3—Get1c crys- 
tals lacked nucleotide. We determined the structure of this nucleotide- 
free complex to a resolution of 3.0A (Supplementary Table 1 and 
Supplementary Fig. 9) and revealed Get3 in an ‘open’ conformation, 
with two Getl fragments bound to equivalent sites on opposite faces 
of the symmetric Get3 homodimer (Fig. 3b). Each Getl fragment 
adopts an antiparallel coiled-coil structure and buries ~ 1,030 A’ of 
surface area in a bipartite interface split evenly between the two Get3 
subunits (Fig. 3c and Supplementary Fig. 10). As observed in the 
Get2c complex, Get3 residues on the interface undergo little con- 
formational change on binding to Getlc (Supplementary Fig. 11). 
Binding to one Get3 monomer is primarily mediated by hydrophobic 
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Figure 3 | Get1 fragment complex with Get3. a, Predicted topology of S. 
cerevisiae Get1 with a large cytosolic-facing region (magenta). b, Structure of 
two Get1 fragments (magenta) bound to the open dimer state of Get3 (green, 
blue). The composite hydrophobic groove is completely disrupted. c, Get3 
residues in the Get1 interface are indicated; significant contacts are made to 
both monomers (green, blue), including the P-loop, switch I and switch II 
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contacts between helix «2 of Getlc and the cleft defined by helices «10 
and «11 of Get3 (Fig. 3c, d, green). Binding to the other monomer is 
mediated by helix «1 of Getlc, which interacts with Get3 helices «4 
and «5, and by a six-residue loop in Getlc that directly contacts the 
ATP-binding site (Fig. 3c, e, blue; see below). 

Importantly, many of the Get3 residues that contact Getlc also 
mediate interactions with Get2c (Supplementary Figs 10 and 11). 
For example, the conserved Arg 73 (Get1c)/Glu 253 (Get3) salt bridge 
almost perfectly mimics the Arg 29 (Get2c)/Glu 253 (Get3) inter- 
action (Figs 2e and 3d). The presence of overlapping binding sites 
suggests that Get] and Get2 cannot simultaneously occupy the same 
site on Get3, as illustrated by dissociation of the Get3-Get2c complex 
by Getlc (Supplementary Fig. 11). Previous work underscores the 
functional significance of this region of Get3: alanine substitutions 
within the shared interface, including F246A, Y250A, E253A and 
Y298A, have a strong loss-of-function phenotype in yeast’®. 
Moreover, two of these positions, Tyr 250 and Glu 253, have been 
implicated in the ATP-dependent binding of Get4”*. Thus, the «10- 
o11 region of Get3 is a binding hotspot that probably plays an import- 
ant regulatory role at different stages of the targeting cycle. 

The most striking aspect of the Get3-Getlc structure is how the 
Get1 coiled coil wedges between the Get3 subunits, completely dis- 
rupting the hydrophobic TA substrate-binding site (Fig. 3b). Such an 
interaction could effect substrate release from the Get3 targeting com- 
plex. However, parts of the bipartite Getl-binding site on Get3— 
including the ATPase motifs and portions of helices «4 and «5 
(Fig. 3c, blue)—are buried in the ATP-bound, fully closed-dimer 
conformation. By contrast, the bipartite Get1-binding site is largely 
exposed to solvent in the Mg”*-ADP-bound state!””° (Supplementary 
Fig. 12). This implies that ATP hydrolysis by the targeting complex is 
needed to expose the Getl-binding site on Get3 (Fig. 3c and Sup- 
plementary Fig. 12, green and blue). Once exposed, Getl would 
complete the Get3 transition from closed to open, disrupting the 
hydrophobic groove to promote release of the TA substrate and 
ADP (which binds weakly to substrate-free Get3; Supplementary 
Fig. 18). Importantly, the rigid Get1 coiled coil is perpendicular to 


t)) 


Switch II 


ATPase motifs. d, Close-up of interactions between Get! helix «2 (magenta) 

and one Get3 monomer (green), including the conserved salt bridge between 
Arg 73 and Glu 253. This interface overlaps extensively with the Get2c binding 
surface (Fig. 2e and Supplementary Fig. 11). e, Close-up of interactions between 
the Get1 hairpin loop and the active site of the adjacent Get3 monomer (blue). 
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the plane of the membrane, thereby positioning the hydrophobic 
groove of Get3 parallel to the membrane. This implies that the 
TMD of a TA protein is precisely released along the membrane sur- 
face, presumably facilitating its subsequent insertion. 


Targeting and substrate release 


Conserved contacts between Get3-Get2 and Get3-Getl were dis- 
rupted with point mutations (R17E and R73E, respectively), verified 
to prevent binding (Supplementary Fig. 13) and shown to reduce 
insertion in the reconstituted system sharply (Fig. 4a). When the sub- 
strate-Get3 interaction was monitored by crosslinking (Supplemen- 
tary Fig. 14), Getlc, but not Get2c, was found to release TA substrate 
from Get3 (>50% at 500 nM; Fig. 4b). This activity was abolished by 
the R73E mutation that disrupts Get3-Getlc interactions (Fig. 4c). 
Thus, Getlc and Get2c both inhibit insertion (Supplementary Fig. 8), 
but for different reasons: Getlc causes premature substrate release 
whereas Get2c competitively precludes targeting. 

When reconstituted into proteoliposomes at more-physiologic 
concentrations, neither rGetl nor rGet2 was able to effect substrate 
release, whereas the complete rGet1/2 complex was active (Fig. 4d). 
Importantly, disrupting the Get3-Get1 interaction (with R73E) or the 
Get3-Get2 interaction (with R17E) abolished the ability of the rGet1/2 
complex to stimulate substrate release (Fig. 4e). Thus, whereas Get1c at 
super-physiologic concentrations can drive substrate release on its own, 
full-length Getl in the membrane is unable to do so at physiologic 
levels. In this context, Get1 requires Get2 (specifically its ability to bind 
Get3) to release substrate from Get3. 

On the basis of the Get3—-Getlc structure, ATP hydrolysis by the 
Get3 targeting complex is likely to be necessary for its interaction with 
Get1. Indeed, targeting complexes containing an ATPase-deficient 
Get3 mutant (D57N) were poorly inserted into proteoliposomes con- 
taining the rGet1/2 complex (Fig. 4f) despite no impairment of the 
interaction of Get3 (D57N) with substrate or the rGet1/2 complex 
(Supplementary Fig. 15 and data not shown). Analysis of the inter- 
action between TA substrate and Get3 (D57N) revealed that the 
rGet1/2 complex was unable to induce release (Fig. 4d, e). Taken 
together, the results of the functional analysis indicate that the 
Get3-Get2 interaction is important for targeting, and that this step 
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Figure 4 | Mutational analysis of the function of Get1, Get2 and Get3. 

a, Insertion assay with purified Get3-Sec61 targeting complex and 
proteoliposomes containing the indicated purified proteins. Liposomes and yRM 
are controls. Get1* and Get2* indicate mutants inactive in Get3 interaction 
(R73E and R17E, respectively). b, Substrate release from targeting complexes 
incubated with Getlc or Get2c; release was monitored by loss of the crosslink 
(XL) between radio-labelled substrate and Get3. Square brackets indicate 
concentration. c, As in b, with wild-type and mutant fragments at a concentration 
of 0.5 uM. d, Substrate interaction with Get3 or the ATPase-deficient Get3 
(D57N) was assessed by crosslinking after incubation with liposomes or 
proteoliposomes containing the indicated recombinant proteins. e, As in d, but 
comparing wild-type and mutant complexes of Get1 and Get2. f, Relative 
efficiency of insertion (mean + s.e.m.; n = 6) into rGet1/2 proteoliposomes with 
targeting complexes prepared from wild-type Get3 or Get3 (D57N). 
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is a prerequisite for substrate release. Substrate release, in turn, 
depends on both ATP hydrolysis by Get3 and the ability of Get3 to 
interact with Getl. 


ATP-dependent recycling 


The ATP that Get3 hydrolyses before substrate release is apparently 
acquired from the in vitro translation reaction (and maintained during 
purification) because insertion proceeds efficiently without additional 
ATP in the purified system (Fig. 5a). This is consistent with structural 
analysis suggesting that nucleotide is shielded from bulk solvent in the 
fully closed Get3-ATP-substrate ternary complex (Supplementary 
Discussion). However, we (Supplementary Fig. 16) and others®*’” have 
found that insertion reactions into crude yRMs, but not rGet1/2 
proteoliposomes, are stimulated by ATP, non-hydrolysable ATP ana- 
logues or ADP. The explanation for this discrepancy proved to be the 
near-stoichiometric presence of Get3 on the Get1/2 complex in yRMs 
(Supplementary Fig. 5), but not on rGetl/2 proteoliposomes. 
Accordingly, binding Get3 to rGet1/2 proteoliposomes restored ATP 
dependence (Fig. 5b), whereas removing Get3 from yRM (by using 
AGet3 yeast) eliminated the ATP requirement for maximal insertion 
(Fig. 5c). 

These results indicate that after TA substrate release, Get3 remains 
bound to microsomal membranes. In the nucleotide-free Get3-Getlc 
structure, which mimics this ‘post-insertion’ complex, residues within 
the conserved loop of Get1 (°ISAQDN™) insert into the Get3 active 
site (Fig. 3e) and deform it relative to the ADP*AIF, -bound con- 
formation (Fig. 5d). Modelling ATP into the active site reveals steric 
and electrostatic clashes between Get1 and ATP, suggesting that free 
ATP should displace Get3 from Get1. Indeed, the Get3-Get1c inter- 
action was quantitatively disrupted by micromolar concentrations of 
ATP (Fig. 5e). ADP was far less effective, and AMP failed to disrupt 
the Get3-Getlc complex. This ATP-dependent Get3 dissociation was 
also verified with full-length Getl using pull-down assays (Sup- 
plementary Fig. 19). By contrast, none of the tested nucleotides 
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Figure 5 | ATP-dependent recycling of empty Get3 from Get1. a, Insertion 
activity of purified Get3-Sec61B targeting complex using the indicated vesicles 
with or without an ATP regenerating system. b, Proteoliposomes containing 
the rGet1/2 complex, or the rGet1/2 complex bound to Get3 (left panel), were 
tested for insertion activity of purified targeting complex in the presence or 
absence of ATP (right panel). Coom., Coomassie blue. c, Purified targeting 
complex was tested for insertion into wild-type yRMs or those from a AGet3 
strain, with or without ATP. d, Close-up of the Getlc-Get3 complex (magenta 
and blue) modelled onto the active site of the closed, ADP*AIF, -bound Get3 
dimer (grey). Steric (dashed lines) and electrostatic clashes between conserved 
residues in Get1 and the nucleotide y-phosphate are apparent. e, Dissociation 
of Get3-Getlc, monitored by the change in fluorescence resonance energy 
transfer (AF), on titration with the indicated nucleotides. Curve fits of triplicate 
measurements (mean + s.e.m.) are shown. a.u., arbitrary units. The reaction 
contained 10nM Get3 (D57N) and 100 nM Getlc. f, As in e, but with 10nM 
Get3 (D57N) and 200 nM Get2c. 
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Figure 6 | Model for TA protein insertion. Nucleotide- and tail-anchored 
substrate-bound Get3 in a closed-dimer conformation forms the “docked 
complex’ by association with Get2. D, ADP; T, ATP. Following ATP hydrolysis, 
Get] interacts with and orients Get3 along the membrane surface. This 
stabilizes the open-dimer conformation of Get3, disrupts the composite 
hydrophobic groove and promotes TA substrate release for membrane 
insertion. The Get3-Get1 post-insertion complex is dissociated by ATP 
binding, recycling Get3 back to the cytosol. See Supplementary Discussion for 
more details. 


Post-insertion 
complex 


disrupted Get2c binding to Get3 (Fig. 5f). Thus, free ATP binding 
dissociates the Getl-Get3 complex to recycle Get3 from the mem- 
brane after TA substrate release. 


A model for the insertion cycle 


Figure 6 illustrates our working framework for the insertion cycle. 
Substrate-bound Get3 in the closed conformation and loaded with 
nucleotide (either ATP or ADP; see Supplementary Discussion) is 
captured at the membrane by the cytosolic domain of Get2. The 
apparently long and flexible Get2 tether may facilitate this initial 
encounter and bring the intact targeting complex near to the site of 
insertion. After this targeting step, Get] mediates the post-targeting 
reactions of substrate release and insertion. Getl binding to the 
targeting complex would be facilitated by partial destabilization of 
the closed dimer after ATP hydrolysis, and by the high local concen- 
tration of Get3 achieved by its recruitment through Get2. Binding to 
the rigid Get1 coiled coil would orient Get3 such that the substrate is 
in close proximity to the membrane. Moreover, by stabilizing the 
open conformation, Getl binding would disrupt the Get3 hydro- 
phobic groove and promote release of substrate and ADP. At present, 
we do not know whether the Get1/2 complex functions as a hetero- 
dimer or heterotetramer, although we favour the latter given the sym- 
metric structure of the Get3 dimer. The released substrate would insert 
unassisted into the lipid bilayer directly*°’’ or would be chaperoned by 
the TMDs of the Get1/2 complex. Finally, the empty Get3 would be 
released from Getl concomitant with ATP binding, and would be 
primed to accept the next substrate from the cytosolic pre-targeting 
complex for another round of targeting. 


METHODS SUMMARY 

Reagents and assays. Constructs, proteins and antibodies derived from earlier 
studies*'*'° are described in Methods. Antibodies against Get1 and Get2 were pro- 
duced in rabbits. In vitro translation, insertion, crosslinking and immunoprecipita- 
tion were as described previously*"*”*. Get] and Get2 (full length and fragments) were 
expressed in E. coli and purified by Ni-NTA chromatography; fragments were further 
purified by size exclusion chromatography. *°S-labelled targeting complexes were 
affinity-purified from in vitro translation reactions containing recombinant Get3. 
Liposomes, microsomes and proteoliposomes. Liposomes containing a 4:1 ratio 
of egg phosphatidylcholine and dipalmotylphosphatidylethanolamine were pre- 
pared by extrusion”. Yeast rough microsomes were prepared as before’. 
Proteoliposome reconstitutions from solubilized yRMs or purified Get1 and/or 
Get2 were done by optimizing (Supplementary Fig. 17) earlier methods*****. 
Interaction analysis. Binding assays were performed by gel filtration and multi- 
angle light scattering, pull-down assays or fluorescence resonance energy transfer. 
Substrate release was monitored by amine-reactive crosslinking’. 

Structure determination. Complexes of Get3 with Getlc or Get2c were co- 
expressed in E. coli and purified by Ni-NTA and size exclusion chromatography. 
Diffraction data were collected at Advanced Photon Source beamline 21-IDG, 
Argonne National Laboratory. Structures were determined by molecular replace- 
ment in PHASER”. Refinement and model building was done using PHENIX*® 
and COOT”. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Reagents and basic procedures. Antibodies against Get1 (residues 61-74) and 
Get2 (residues 2-12) were generated against synthetic peptides conjugated to 
KLH via terminal cysteines. Antibody against yeast Get3 was against the whole 
recombinant protein. Antibody production was by LAMPIRE Biological 
Laboratories. The antibodies against the 3F4 tag and Sec61 have been described 
previously*. The Sec6l% antibody was a gift from Tom Rapoport (Harvard 
University). DeoxyBigCHAP (DBC) was obtained from Calbiochem. Yeast 
strains were from Open Biosystems collections and were provided by Tom 
Dever. The following lipids were obtained from Avanti Polar Lipids: egg phos- 
phatidylcholine (PC), 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphoethanolamine 
(PE) and 1,2-dipalmitoyl-sn-glycero-3-phosphoethanolamine-N-lissamine rho- 
damine B (rhodamine-PE). Each lipid was dissolved and stored in chloroform at 
—20°C or —80°C. Protease inhibitor cocktail was from Roche (EDTA-free 
Complete tablets) and dissolved as a X25 stock in aqueous buffer just before 
use. In vitro translation, chemical crosslinking and immunoprecipitations were as 
described previously*'***. 

Preparation of proteins for functional analysis. The genes encoding full-length 
or cytosolic fragments of S. cerevisiae Get1, Get2 and Get3 were amplified by PCR 
from genomic DNA. Site-directed mutants were obtained by QuikChange muta- 
genesis (Stratagene). Unless otherwise noted, all constructs were subcloned into a 
pET28 derivative (Novagen) modified to incorporate a tobacco etch virus (TEV) 
protease cleavage site between an N-terminal 6 His tag and the polylinker. All 
constructs were verified by DNA sequencing. 

Expression and purification of full-length Get3 (wild type and D57N) was 
carried out as described previously’®. Full-length Getl and Get2 (wild type and 
mutants) were expressed in E. coli Rosetta2/pLysS (Novagen) using the Overnight 
Express Autoinduction System 1 (Novagen). Cells were disrupted in buffer A 
(50mM HEPES, pH 8.0, 500mM NaCl, 10mM imidazole, 5% glycerol) with 
1mM PMSF using a high-pressure microfluidizer (Avestin), and the insoluble 
pellet was isolated by centrifugation. This pellet was washed in buffer A, recen- 
trifuged and solubilized for 1h at 4°C in buffer A containing 0.5% n-dodecyl- 
N,N-dimethylamine-N-oxide (LDAO). The detergent-soluble fraction was then 
subjected to nickel-affinity chromatography (Ni-NTA agarose, Qiagen) in buffer A 
containing 30 mM imidazole and 0.1% LDAO. Protein was eluted at ~1 mg ml 
in buffer A containing 200 mM imidazole and 0.1% LDAO, and stored in aliquots 
at —80 °C. Protein concentrations were determined using calculated A2g9 extinc- 
tion coefficients. 

The cytosolic Get1 fragment (residues 21-104) was expressed for 3h at 37 °C 
(wild type) or overnight at 25°C (R73E mutant) in E. coli BL21(DE3)/pRIL 
(Novagen), following induction with 0.1 mM IPTG. Cells were disrupted in buffer 
B (50mM Tris, pH7.5, 500mM NaCl, 10mM imidazole, 5% glycerol, 5 mM 
B-mercaptoethanol) with 1 mM PMSF using a microfluidizer. After clearing by 
centrifugation, the supernatant was batch-purified by nickel-affinity chromato- 
graphy. Protein was eluted in buffer B containing 200 mM imidazole, dialyzed 
into 10mM Tris, pH7.5, 250mM NaCl and 40% glycerol, and then stored at 
—80 °C. This was typically followed by gel filtration (Superdex 200 10/300 GL, GE 
Healthcare) in 10 mM Tris, pH 7.5, and 200 mM NaCl. Fractions were pooled and 
stored in aliquots at —80°C. Protein concentrations were determined using 
calculated Azgo extinction coefficients. 

The cytosolic Get2 fragment (residues 1-38 or 1-106; wild type and R17E) was 
expressed with an N- or C-terminal 6 His tag overnight at 25 °C and purified by 
nickel-affinity chromatography as described above for the Get1 fragment. After 
dialysis against 10 mM Tris, pH7.5, and 200mM NaCl, proteins were further 
purified by gel filtration in 10 mM Tris, pH 7.5, and 150 mM NaCl. Fractions were 
pooled, concentrated and stored in aliquots at —80 °C. Protein concentration was 
determined by BCA (Pierce). 

Preparation of liposomes. The standard liposome mixture typically contained 
PC:PE:rhodamine-PE at a mass ratio of 8:1.9:0.1. Rhodamine-PE serves as a 
tracer to follow the lipid recovery. Lipid solutions were mixed in the above ratios 
as chloroform stocks, adjusted to 10mM DTT and dried in a glass tube by 
centrifugation under vacuum (SpeedVac, Eppendorf) for 12h. Lipid films were 
hydrated to a final concentration of 20 mg ml’ in lipid buffer (50 mM HEPES- 
KOH, pH 7.4, 15% glycerol) and mixed end to end for 6 h at 25 °C with intermittent 
vortexing. The milky and uniform suspension was subjected to three freeze-thaw 
cycles (freeze in liquid nitrogen; thaw at 37°C) and extruded at 65°C 11 times 
through 100-nm polycarbonate membranes using an Avanti mini-extruder’’”’. 
Single-use aliquots (100 il) of the final clear liposome solution were flash-frozen 
in liquid nitrogen and stored at —80 °C. 

Purification of recombinant targeting complex. The DNA template for the 
double-Strep-tagged human Sec61f was generated by PCR using a 5’ oligonucleo- 
tide that encodes the T7 promoter, start codon and tag. This template was tran- 
scribed and translated in RRL as described previously”*, but with 0.15 mg ml’ 
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His-Get3 (added from a 20 mg ml! stock in 10 mM Tris-HCl, pH7.5, 100 mM 
NaCl and 40% glycerol). A 2-ml translation reaction was diluted twofold with ice- 
cold column buffer (20 mM HEPES-KOH, pH7.4, 100mM potassium acetate, 
2mM magnesium acetate, 1mM DTT) and centrifuged for 30 min at 540,960g 
ina TLA100.3 rotor at 4 °C. The post-ribosomal supernatant was bound to a 400-1 
DEAE-Sepharose fast-flow column at 4 °C, washed with column buffer and eluted 
with a buffer containing 50 mM HEPES-KOH, pH 7.4, 320 mM potassium acetate, 
7mM magnesium acetate and 1mM DTT. The elution was passed over 200 ul 
Strep-Tactin agarose (IBA, Germany) one to three times. After washing with four 
column volumes of Strep-Tactin buffer (50mM HEPES-KOH, pH7.4, 10% 
glycerol, 150mM potassium acetate, 7 mM magnesium acetate, 1 mM DTT) at 
4°C, bound proteins were eluted with 5 x 50 ul Strep-Tactin buffer containing 
10 mM Desthiobiotin (Novagen). The peak fractions, measured by counting radio- 
activity, were pooled. The final sample contained ~10,000 c.p.m. tl‘. The con- 
centration of Get3 in the final sample was estimated to be ~80nM. Thus, the 
targeting complex in our typical preparation has a concentration of ~40nM, 
assuming a 2:1 ratio of Get3 to Sec61f. This was either used immediately or frozen 
in aliquots in liquid nitrogen and stored at —80 °C. Targeting complexes contain- 
ing the TMDs of rat VAMP2 and S. cerevisiae Sed5 in place of the Sec61B TMD 
were made similarly. 

Insertion assay. Post-translational insertion assay was performed as described 
before®, with the following minor modifications. For a standard reaction, 8 1l of 
purified targeting complex was mixed with 1 ul of ATP regenerating system 
(2mM ATP, 10mM creatine phosphate and 40 pg ml’ creatine kinase) and 
1 pl of yRMs, liposomes, reconstituted proteoliposomes or a matched buffer. 
ATP regenerating system was omitted in some reactions as indicated in the figure 
legends. After incubation at 32°C for 30 min, the samples were treated with 
proteinase K (0.5mgml ') for 60 min on ice, and the protease digestion was 
terminated with 5mM PMSF and transferred to 100 ll of boiling 1% SDS as 
described previously*. The protease-protected fragment was then immunopreci- 
pitated using the 3F4 antibody directed against the C terminus of the Sec61B 
construct. Immunoprecipitated products were analysed by SDS-polyacrylamide 
gel electrophoresis (SDS-PAGE) and quantified by phosphor imaging. 
Preparation of rough microsomes from yeast. Yeast microsomes were prepared 
by modifications of the methods previously described*®**°*. TAP-tagged Get 
(Open Biosystems) or Get deletion strains (gift from T. Dever) were grown at 
30 °C to a density of 2A¢oo U in 11 of YPD medium containing 2% glucose. Cells 
were collected by centrifugation at 3,000g for 5 min and washed twice with ice- 
cold distilled water. All subsequent steps were on ice or at 4 °C. The cell pellet was 
resuspended in 50 ml of homogenization buffer (20 mM HEPES-KOH, pH 7.4, 
100 mM potassium acetate, 2 mM magnesium acetate) and centrifuged for 5 min 
at 3,000g. The resulting cell pellet was resuspended in homogenization buffer 
containing 2mM DTT and protease inhibitor cocktail (Roche) at a concentration 
of 1 ml per gram of cell pellet. Pre-chilled glass beads were added (3 gml ' of 
suspension), and cell lysis was induced as follows: the tube was vigorously shaken 
up and down over a 50-cm path at ~1-2 cycless ' for three 1-min periods 
separated by 1 min chilling on ice. Approximately 50% of the cells were broken 
by this method as visualized by microscopy. The fluid phase was drained off 
through a fine nylon mesh into a JA17 tube and spun at 10,000g for 10 min. 
The post-mitochondrial supernatant was briefly centrifuged in a MLA80 rotor 
at 339,707¢ for 8 min. Each 2 ml of the clear supernatant was layered on 1 ml of 
0.67 M sucrose cushion in homogenization buffer and centrifuged for 30 min ina 
TLA100.3 rotor at 265,070g. The resulting membrane pellet was resuspended in 
homogenization buffer containing 250mM sucrose and 2mM DTT to a final 
standard concentration of 100A2g9 (measured after solubilization in 1% SDS). At 
this concentration, 1 pil yRM is defined as two equivalents (equiv.). One litre of 
culture yielded about 2,400 equiv. Aliquots were frozen in liquid nitrogen and 
stored at —80 °C. 

Depletion of Get1 and Get2 from microsomal extract. TAP-Get1 yRMs (1.5 ml, 
or 1,500 equiv.) were adjusted to 1% DBC in solubilization buffer (50 mM 
HEPES-KOH, pH 7.4, 500mM potassium acetate, 5mM magnesium acetate, 
250mM sucrose, 1mM DTT and protease inhibitor cocktail). After 10 min 
incubation on ice, the detergent extract was centrifuged at 540,960g for 30 min 
in a TLA100.3 rotor at 4°C. The supernatant (yRM extract) was incubated with 
0.1 ml of IgG Sepharose (GE Healthcare) for 1 h at 25 °C. The unbound fraction 
was incubated with 0.1 ml of anti-Get2 antibodies coupled to protein-A agarose 
for 1h at 25 °C. The flow-through was finally incubated with a mixture of 0.1 ml 
each of anti-Get1 and anti-Get2-antibodies coupled to protein-A agarose for 1h 
at 25 °C. The flow-through from this column was used for reconstitution studies. 
It should be noted that a residual amount of the Get1/2 complex is sufficient to 
achieve the maximal insertion under in vitro conditions. Therefore, multiple 
rounds of depletion of the Get1/2 complex (with at least ~95% depletion) were 
necessary to fully deplete insertion activity. For purification of TAP-Getl1 (and 
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associated proteins), the IgG Sepharose resin from above was washed with low- 
salt buffer (10 mM Tris, pH 7.4, 150mM NaCl, 10% glycerol, 0.25% DBC and 
1mM DTT) and eluted with 70 U TEV-protease (Invitrogen) overnight at 4 °C. 
The TEV elution was adjusted to 2.5 mM CaCl,, and incubated with calmodulin 
Sepharose (GE Healthcare) for 90 min at 4°C. The beads were washed with 
low-salt buffer containing CaCl, and eluted with low-salt buffer containing 
5mM EGTA. The eluted proteins were precipitated with TCA and analysed by 
SDS-PAGE. 

Reconstitution of proteoliposomes from microsome extracts. Following earlier 
methods***?*4, yRMs were adjusted to a concentration of 1 equiv. pl”! in the 
following conditions: 50 mM HEPES-KOH, pH 7.4, 500 mM potassium acetate, 
5mM magnesium acetate, 250 mM sucrose, 1 mM DTT, 1% DBC and protease 
inhibitor cocktail. After 10 min on ice, the ribosomes were removed by centrifu- 
gation at 540,960g. for 30 min ina TLA100.3 rotor at 4 °C. Typically, 100 pl of this 
clarified yRM extract was mixed with 10 ul of liposomes (200 pg) and 50 mg of 
Biobeads SM2 (Bio-Rad). The Biobeads were prewashed extensively ahead of 
time with methanol and water. The mixture was incubated for 12-16 h with gentle 
overhead mixing at 4 °C. The fluid phase was separated from the beads, diluted 
with five volumes of ice-cold distilled water and sedimented in a TLA100.3 rotor 
in micro-test tubes at 304,290g for 30 min at 4°C. The proteoliposomes were 
resuspended in 25 pl of membrane buffer (50 mM HEPES-KOH, pH 7.4, 100 mM 
potassium acetate, 5 mM magnesium acetate, 250 mM sucrose, and 1 mM DTT). 
Reconstitution of proteoliposomes with purified proteins. The optimum 
method for reconstitution of purified Get1 or Get2 was empirically determined after 
testing various detergents and reconstitution methods (Supplementary Fig. 17) The 
precise method of reconstitution proved to be important for obtaining maximally 
functional proteoliposomes. The incorporation and activity of Get1 and Get2 varied 
with different detergents. Of those tested, DBC worked the best to achieve the 
maximal activity of Get] and Get2. Every batch of DBC requires some degree of 
optimization with respect to the amount of Biobeads used for detergent removal. For 
a standard reconstitution reaction, 100 ul of reconstitution buffer (50 mM HEPES- 
KOH, pH7.4, 500mM potassium acetate, 5mM magnesium acetate, 250mM 
sucrose, 1mM DTT, 0.25% DBC) was mixed with 10 ul of liposome (200 pg) 
and purified Getl or Get2 at the desired concentration. For preparation of 
liposomes used as controls in the assays, purified proteins were omitted. This 
mixture was added to between 25 and 30 mg of Biobeads (optimized for each batch 
of DBC), and incubated with overhead mixing for 12 h at 4 °C. The fluid phase was 
separated and diluted with five volumes of ice-cold water. In some instances, the 
proteoliposomes were mixed with Get3 and incubated for 15 min at 25°C, fol- 
lowed by 30 min at 4 °C with shaking, to allow binding. After dilution, the lipo- 
somes were sedimented in a TLA100.3 rotor in micro-test tubes at 304,290g for 
30 min at 4°C. The proteoliposomes were resuspended in 25 pl of membrane 
buffer as above. SDS-PAGE Coomassie staining and immunoblots were per- 
formed to assess the efficiency of protein incorporation; the rhodamine-PE served 
as a marker for lipid recovery. Typical recovery for Get1 and Get2 reconstitution 
was ~50%. 

Multi-angle light scattering. The absolute molecular masses of individual 
proteins and complexes were measured by static multi-angle light scattering. 
Purified samples were injected onto a Superdex 200 HR 10/30 gel filtration 
column (GE Healthcare) equilibrated with 10 mM Tris, pH 7.5, 150 mM NaCl 
and 2mM DTT. The purification system was coupled to an online, static, light- 
scattering detector (Dawn HELEOS II, Wyatt Technology), a refractive-index 
detector (Optilab rEX, Wyatt Technology) and a ultraviolet-light detector 
(UPC-900, GE Healthcare). Absolute weight-averaged molar masses were calcu- 
lated using the ASTRA software (Wyatt Technology). 

Receptor fragment binding assays. Gel-filtration-purified, 6 His-tagged 
Get1(21-104), Get2(1-106) and Get3 (wild type and D57N) proteins were 
labelled with amine-reactive succinimidyl esters of Alexa488 or Alexa594 
(Invitrogen). Labelling reactions were carried out by incubating ~ 150 |.M protein 
and ~600 uM dye for 1h at room temperature (23°C) in 100 mM NaHCOs;, 
pH8.3, and 200 mM NaCl. After labelling, proteins were desalted and concen- 
trated in Amicon Ultra filtration units (Millipore) to ~100 1M in 20 mM HEPES, 
pH7.5, and 200mM NaCl (receptor fragments) or 20mM HEPES, pH7.5, 
200mM NaCl and 2mM DTT (Get3), and stored in aliquots at —80°C. 
Protein concentration was determined using calculated Ag extinction coeffi- 
cients after correcting for dye absorbance. Under these labelling conditions, we 
typically observed ~0.5-1.5 mol of dye per mole of protein. 

Dissociation constants (Kg) were determined by titrating a fixed amount of 
labelled, nucleotide-free Get3 with labelled Getlc or Get2c. Fluorescence mea- 
surements were made in 96-well format using a Safire2 (Tecan) plate reader. 
Alexa594-labelled fragments were excited by fluorescence resonance energy 
transfer from Alexa488-labelled Get3 (wild type or D57N), using excitation 
and emission wavelengths of 495 and 615 nM, respectively. All experiments were 


carried out in 150 pl of 50 mM HEPES, pH 7.5, 100 mM NaCl, 5mM MgCh, 5% 
glycerol, 0.02% Tween20 and 2 mM DTT. Blank titrations were carried out in the 
absence of labelled Get3 and were subtracted from the respective titration curves 
obtained in the presence of labelled Get3. The difference curves were evaluated by 
nonlinear regression using the following quadratic binding equation: 
AY = 0.5Byax/P(Ka + P+ X — \((Ka + P+ X)* — 4PX)), where Bmax is the 
amplitude, P is the total concentration of labelled Get3, and X is the total con- 
centration of labelled Getlc or Get2c. 

Chase titrations were carried out by measuring fluorescence resonance energy 
transfer between Alexa488-labelled Get3 (wild type or D57N) and Alexa594- 
labelled fragments in the presence of increasing concentrations of an unlabelled 
fragment or nucleotide. Blank titrations were performed in the absence of labelled 
Get3 and were subtracted from the respective titration curves obtained in the 
presence of labelled Get3. The difference curves were evaluated by nonlinear regres- 
sion using the following equation: AY = Feng + BmaxP/(P + Kajabettea(1 + X/Ka)), 
where F..nq is the signal at saturation, Bmax is the amplitude of the signal change, P is 
the total concentration of labelled fragment, Ka abetted is the dissociation constant of 
the Get3 fragment complex, X is the total concentration of the unlabelled compon- 
ent and Kg is the dissociation constant of the unlabelled component. 

Nucleotide binding assays. Fluorescence measurements were made in 96-well 
format using a Safire2 plate reader with excitation and emission wavelengths of 
285 and 446 nm, respectively (Supplementary Fig. 18). All experiments were carried 
out with gel-filtration-purified, 6x His-tagged Get3 (D57N) in 150 pl of 50mM 
HEPES, pH 7.5, 100 mM NaCl, 5mM MgCh, 5% glycerol, 0.02% Tween20 and 
2mM DTT. The dissociation constant of mant-ATP was measured by incubating 
1M of Get3 (D57N) with increasing concentrations of mant-ATP (Molecular 
Probes). Dissociation constants of unlabelled nucleotides were determined by 
incubating 1 1M Get3 (D57N) with 1 uM mant-ATP and chasing with increasing 
concentrations of the corresponding unlabelled nucleotide. In each case, blank 
titrations were performed in the absence of Get3 and were subtracted from titration 
curves obtained in the presence of labelled Get3. ATP and ADP concentrations were 
determined by absorbance (<7 = 15,400 M ‘cm '). Dissociation constants were 
determined by curve fitting as described above. 

Tail-anchored substrate release assay. Get3-substrate complexes were 
assembled by in vitro translation in a phenyl- and DEAE-Sepharose-depleted 
RRL” supplemented with 6x His—Get3 at ~2 1g ml. This translation extract 
lacks endogenous TA binding proteins (particularly TRC40 and Bag6). 
Translation of the TA substrate in this system was verified to result in Get3- 
substrate complexes by crosslinking, and was functional as judged by Get1/2- 
dependent insertion (data not shown). Complexes generated by this method were 
mixed with the fragments or proteoliposomes as indicated in the figure legends, 
incubated for 30 min at 32 °C and subjected to crosslinking with disuccinimidyl 
suberate as described previously*. The samples were denatured in 1% SDS, diluted 
tenfold in 1% Triton X-100 buffer and subjected to pull-downs of 6 His—Get3 
with immobilized Co”* bound to chelating Sepharose (GE). The Get3-substrate 
crosslink was visualized by autoradiography. 

Preparation of Get3 receptor fragment complexes for crystallization. The gene 
encoding native, full-length S. cerevisiae Get3 was subcloned into pET19b 
(Novagen). For co-expression with N-terminal 6x His-tagged Get1(21-104), 
plasmids were co-transformed into E. coli BL21(DE3)/pRIL (Novagen). 
Proteins were expressed at 37°C for 3h by induction with 0.1mM IPTG after 
the cells reached an Agog of ~0.6. Cells were disrupted and purified by nickel- 
affinity chromatography as described above for the Getl and Get2 fragments. 
Protein was eluted in buffer B containing 200 mM imidazole, and then dialysed 
into 10 mM Tris, pH7.5, 100 mM NaCl, 2mM DTT and 40% glycerol. This was 
followed by cleavage with 6x His-tagged TEV protease and removal of residual 
uncleaved Get1 fragments and the 6 x His-tagged TEV protease by subtractive Ni- 
NTA purification. Finally, the complex was separated from excess Get1 fragments 
by gel filtration. Fractions were pooled, concentrated to ~10 mg ml’ in 10mM 
Tris, pH7.5, 100 mM NaCl and 2 mM DTT, and stored in aliquots at —80 °C. 

Co-expression of native Get3 and N-terminal 6X His-tagged Get2(1-106) or 
Get2(1-38) was performed as above, except that proteins were expressed at 25 °C 
for 6-8 h after induction. Following cell lysis and purification by nickel-affinity 
chromatography, the protein was dialysed into 10mM Tris, pH 7.5, 200 mM 
NaCl and 2mM DTT. This was followed by cleavage with 6 His-tagged TEV 
protease and subtractive Ni-NTA purification. Finally, the complex was separated 
from excess Get2 fragments by gel filtration. Fractions were pooled, concentrated 
to ~15-20 mg ml! in 10 mM Tris, pH7.5, 150 mM NaCl and 2mM DTT, and 
stored in aliquots at —80 °C. 

Crystallization. Crystals of S. cerevisiae Get1(21-104) in complex with S. cerevisiae 
Get3 were grown at room temperature using hanging-drop vapour diffusion by 
mixing equal volumes of a protein solution with a reservoir solution containing 
0.2 M K/Na tartrate, 16% PEG 3350, 0.1 M HEPES, pH 7.2, and 6% polypropylene 
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glycol P400. Crystals were cryoprotected in mother liquor supplemented with 20% 
ethylene glycol, and flash-frozen in liquid nitrogen. 

Crystals of S. cerevisiae Get2(1-38) in complex with S. cerevisiae Get3 and 

ADPeAIF, were grown at room temperature using hanging-drop vapour dif- 
fusion by mixing equal volumes of a protein solution containing 2mM ADP, 
2mM MgCl, 2mM AICI, and 8mM NaF with a reservoir solution containing 
30% PEG 3350, 0.3M ammonium acetate and 0.1 M Bis-Tris, pH 6.0. Crystals 
were briefly soaked in mother liquor supplemented with 20% ethylene glycol and 
flash-frozen in liquid nitrogen. 
Structure determination and refinement. All data were collected at 100K at 
APS beamline 21-IDG (A = 0.97856 A) and processed using HKL2000 (HKL 
Research). Data collection and refinement statistics are listed in Supplementary 
Table 1. 

The structure of the Get1(21-104) complex with Get3 was determined to a 
resolution of 3.0 A by molecular replacement with PHASER™, using the open- 
dimer (nucleotide-free) form of S. cerevisiae Get3 (PDB ID, 3H84*°; with the 
ot-helical subdomain removed) as the search model. No solution could be 
obtained using the closed-dimer form of S. cerevisiae Get3 as the search model. 
Clear density was observed for the helical Get1 fragment and portions of the Get3 
a-helical subdomain in the initial electron density maps. Model building and 
refinement were carried out in PHENIX*® and COOT”. The final model contains 
one Get3 homodimer (chains A and B), two Get1 fragments (chains C and D) and 
one zinc atom, and was refined to an R-factor of 22.4% (Ree = 28.2%). Most 
(94.3%) of the residues are in favoured regions of the Ramachandran plot, and 
0.9% are outliers. Side-chain density is generally weakest in the o-helical sub- 
domains, and no interpretable density was observed for residues 1-4, 97-134, 
155-157, 198-219, 280-284 and 352-354 in chain A; 1-4, 99-125, 191-210, 
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280-284 and 352-354 in chain B; 21-35 and 103-104 in chain C; and 21-36 
and 99-104 in chain D. 

The structure of the Get2(1-38) complex with Get3 was determined to a 

resolution of 2.1 A by molecular replacement with PHASER using a monomer 
of S. cerevisiae Get3 (PDB ID, 2WOJ"’; with the o-helical subdomain and ligands 
removed) as the search model. Density for the two helices of Get2(1-38) and 
portions of the Get3 «-helical subdomain was clearly visible in the initial electron 
density maps. Model building and refinement were carried out in PHENIX and 
COOT. The final model contains one Get3 homodimer (chains A and B), two 
Get2 fragments (chains C and D), two Mg**-ADPeAIF,— complexes, one zinc 
atom and 231 water molecules, and was refined to an R-factor of 18.8% 
(Réee = 23.3%). Again, most (98.0%) of the residues are in favoured regions of 
the Ramachandran plot, and 0.8% are outliers. No interpretable electron density 
was observed for residues 1-4, 101-126, 188-211, 280-284 and 353-354 in chain 
A; 1-3, 102-125, 154-158, 199-211, 280-282 and 351-354 in chain B; 1-3 and 
35-38 in chain C; and 1-3 in chain D. 
Miscellaneous. SDS-PAGE was done with 15% Tris-glycine or 12% Tris-tricine 
gels. Quantification was by phosphor imaging using a Typhoon system with 
accompanying software. Most images for the figures were generated by exposure 
to Kodak MR X-ray film. Films were digitized by scanning. Structure figures were 
generated with Pymol* and all figures were assembled using Adobe Photoshop 
and Illustrator. 
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An extremely primitive star in the Galactic halo 
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The early Universe had a chemical composition consisting of hydro- 
gen, helium and traces of lithium’; almost all other elements were 
subsequently created in stars and supernovae. The mass fraction of 
elements more massive than helium, Z, is known as ‘metallicity’. A 
number of very metal-poor stars has been found”*, some of which 
have a low iron abundance but are rich in carbon, nitrogen and 
oxygen**. For theoretical reasons”* and because of an observed 
absence of stars with Z<1.5 X 10-°, it has been suggested that 
low-mass stars cannot form from the primitive interstellar medium 
until it has been enriched above a critical value of Z, estimated to lie 
in the range 1.5 X 10~* to 1.5 X 10° (ref. 8), although competing 
theories claiming the contrary do exist”. (We use ‘low-mass’ here to 
mean a stellar mass of less than 0.8 solar masses, the stars that 
survive to the present day.) Here we report the chemical composi- 
tion of a star in the Galactic halo with a very low Z (= 6.9 X 10”, 
which is 4.5 X 107° times that of the Sun!) and a chemical pattern 
typical of classical extremely metal-poor stars*’—that is, without 
enrichment of carbon, nitrogen and oxygen. This shows that low- 
mass stars can be formed at very low metallicity, that is, below the 
critical value of Z. Lithium is not detected, suggesting a low- 
metallicity extension of the previously observed trend in lithium 
depletion’’. Such lithium depletion implies that the stellar material 
must have experienced temperatures above two million kelvin in its 
history, given that this is necessary to destroy lithium. 

The Galactic halo star SDSS J102915+ 172927, the object of this 
Letter, has been observed with the X-Shooter'* and UVES" spectro- 
graphs at the Very Large Telescope (VLT), operated by the European 
Southern Observatory in Chile. Its properties are as follows: right 
ascension, 10h 29min 15.15s, declination, +17° 29’ 28” at equinox 
2000; g band magnitude 16.92, (g — z) = 0.59 mag, and, after correc- 
tion for interstellar reddening, (g-z)o = 0.53 mag. A portion of the 
spectra in the region of the Cai K line is shown in Fig. 1. We have 
computed and used theoretical model atmospheres and spectrum syn- 
thesis techniques to derive the chemical abundances provided in 
Table 1. The chemical signatures are consistent with metal production 
by ordinary core-collapse supernovae'*. The derived abundances, 
coupled with the upper limits on carbon and nitrogen, imply 
Z<6.9X10 ’. This number takes into account the typical ‘excess’ 
of the a-element oxygen, [O/Fe] = +0.6. (Here [A/B] = log(Na/ 
Nz) — log(Na/Npg)o for the number N of atoms of elements A and 
B, and subscript © indicates the solar value.) Our analysis has been 
performed assuming local thermodynamic equilibrium (LTE); however, 
further work is necessary to assess the role of departures from LTE, 
especially for molecules. The estimate ofnon-LTE effects on magnesium” 
is about +0.4 dex, which translates in a change of +0.2 X 10 ’inZ. 

It has been suggested that the primary discriminants between the 
formation of only massive stars (as in stellar population III) and of 
both massive and low-mass stars (as in stellar populations II and I) 
are the abundances of carbon and oxygen’. In this scenario, these 
elements can provide efficient cooling of the protostellar clouds in 


the primitive interstellar medium through the fine structure lines of 
ionized carbon and neutral oxygen. A suitable combination of the 
carbon and oxygen abundances is called the transition discriminant'® 
(D= log; (10°! + 0.3 X 10!0/H))), and low-mass star formation is 
believed to occur only if D = —3.5. From the abundances in Table 1 
and the assumption [O/Fe] = +0.6 we have D= —4.2 for SDSS 
J102915+ 172927, which places it in the ‘forbidden zone’ of the theory. 
If, instead of taking the upper limit on the carbon abundance, we 
assume that the carbon abundance (derived from the three-dimen- 
sional (3D) analysis) scales with the iron abundance, as found in other 
metal-poor stars’, we have D S —4.4. Our measurement cannot rule 
out the above-mentioned theoretical scenario”’’, but it strongly sup- 
ports the idea that, at least in some cases, low-mass stars can also form 
at lower carbon and oxygen abundances than the current estimates for 
the critical values. 

The complete absence of the neutral lithium (Li1) resonance doublet 
at 670.7 nm, both in UVES and X-Shooter spectra, is remarkable. In 
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Figure 1 | Observed spectra of SDSS J102915+172927. The spectral region 
of the Cal H and K lines is shown (solid lines), compared to synthetic spectra 
(long dashed lines) computed with a global metallicity of Z= 1.1 X 10 ° and 
solar proportions of all elements, except for x-elements that are enhanced by 
0.4 dex over iron. Main figure: top trace, the X-Shooter spectrum (shifted 
vertically by one unit for clarity); bottom trace, the UVES spectrum. The spectra 
have been normalized to 1 in the continuum. Insets, magnified views of the Ca II 
K line (left) and the Ca H line (right). The absorption due to interstellar gas is 
clearly detectable both in K and H Ca lines (labelled as ‘IS’ in the figure), and 
two hydrogen lines, H8 and He (labelled as ‘H’ in the figure), are visible. The 
measured radial velocity is —34.5 + 1.0kms_'.We computed a Galactic orbit 
from the kinematic data and a distance of 1.27 + 0.15 kpc, estimated from the 
photometry, confirming that the star belongs to the Galactic halo. 
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Table 1 | Abundances in SDSS J102915+172927 


Element A(X), [X/H], [X/Fe], [X/H], Number of A(X)o 
3D 3D 3D 1D lines 

Cc <4.2 =-43 =+0.7 =-38 Gband 8.50 
N =3:1 =<-48 =+0.2 =-41 NH band 7.86 
Mg! 2.95 —4.59+0.10 +0.40 —4.68+0.08 4 7.54 
Sil 3:25 -4.27+0.10 +0.72 -4.27+0.10 1 7.52 
Cal 1.53 —4.80+0.10 +0.19 —4.72+0.10 1 6.33 
Call 1.48 -485+0.11 +014 -471+0.11 3 6.33 
Till 0.14 -4.76+£0.11 +0.23 -4.75+0.11 6 4.90 
Fel 2.53 —4.99+0.12 +0.00 —4.73 £0.13 44 7.52 
Nil 1.35. -4.88+0.11 +0.11 -—455+0.14 10 6.23 
Sril s-2.28 <-5.2 =-0.21 <-51 1 2.92 
These elemental abundances are derived from our UVES spectra. The last column provides the adopted 
solar abundances on the scale A(X) = logio(X/H) + 12. The atmospheric parameters adopted are 
effective temperature Ter = 5,811 K, log g = 4.0 (where gis acceleration due to gravity, incms °, at the 


surface), and microturbulent velocity 1.5 kms}. Tes was derived from the (g — Z)o colour—-Ter 
calibration®®. The combination of photometric and reddening uncertainties causes an uncertainty on 
Tes Of 100 K, the corresponding uncertainty on [Fe/H] is 0.06 dex. We cross-checked the values of Tar 
with a fit of the Hx wings which provided the same effective temperature within 10 K. The surface gravity 
has been fixed from the Balmer jump, as measured by the (u — g) colour. Other gravity indicators, such 
as the calcium ionization equilibrium and the wings of higher-order Balmer lines, are consistent with 
this choice. The uncertainty on the surface gravity is about 0.2 dex, and we can robustly exclude a 
surface gravity log g = 3.0 or lower, thus excluding that the star could be on the horizontal branch. We 
computed synthetic spectra with the SYNTHE code?’ and a one-dimensional (1D) model atmosphere 
was computed with the ATLAS 9 code®’. These synthetic spectra were used to perform line-profile fitting 
for all the measurable features. The 3D corrections were computed using a 3D model atmosphere from 
the CIFIST grid2® with Ter = 5,850 K, log g = 4.0, and metallicity 2.7 x 10°. We were able to measure 
the abundances of only some a-elements (Mg, Ca, Si, Ti) and two iron peak elements (Fe and Ni). The 
erived iron abundance is [Fe/H] = —4.99 (see Table 1; the 3D-corrected abundances in columns 2, 3 
and 4 should be used, the 1D abundances in column 5 are given for reference only). The «-elements are 
slightly enhanced relative to iron, [Mg/Fe] = +0.4. The Sri line at 407.8 nm is not convincingly 
etected, giving an upper limit [Sr/Fe] = —0.21, which is compatible with the general pattern of low [Sr/ 
Fe] found in extremely metal-poor stars*®. The upper limits on carbon and nitrogen are derived by fitting 
the molecular bands of CH (G band) and NH, at 430 nm and 336 nm, respectively. Unfortunately no 
measurement of oxygen is possible in the available spectral range, neither from atomic nor from 
molecular lines, but there is no reason to suspect that a star not enhanced in Mg, C and N should be 
over-abundant in oxygen. 


fact most of the ‘warm’ (effective temperature To > 5,700 K) metal- 
poor dwarf stars display a constant abundance of Li, the so-called Spite 
plateau’*’”. From the signal-to-noise ratio in the UVES spectrum of 
SDSS J102915+ 172927, we derive an upper limit for the Li abundance, 
A(Li) < 1.1 (at 5a). In Fig. 2 we show the Spite plateau as a function of 
the carbon abundance, as well as a function of the iron abundance, 
which we use in turn as a proxy for Z. The sample of stars is composed 
of those with a normal carbon abundance*'*”° and the carbon-rich, 
iron-poor subgiant HE 1327—2326 (ref. 4). The pictures emerging 
from Fig. 2a and b show the same morphology, with the exception of 
star HE 1327 —2326, which has [Fe/H] lower than all the others, but has 
[C/H] comparable to many other stars in the sample. It is noteworthy 
that the only two unevolved stars with [Fe/H] < —4.5 have no detect- 
able Li. 

The most straightforward interpretation of the Spite plateau is that 
the Li observed in the plateau stars is the Li produced in the Big Bang”. 
The theoretical primordial Li abundance’ is a factor of 2-3 larger than 
the value observed on the Spite plateau. A number of explanations of 
this discrepancy have been proposed, which range from stellar phe- 
nomena, such as atomic diffusion”’, to new physics leading to a dif- 
ferent Big Bang nucleosynthesis”. Our upper limit implies that the Li 
abundance of SDSS J102915+ 172927 is far below the value of the Spite 
plateau. At extremely low metallicities, the Spite plateau displays a 
‘meltdown”"’, that is, an increased scatter and a lower mean Li abund- 
ance. This meltdown is clearly seen in the two components of the 
extremely metal-poor binary system CS 22876-32, which show a dif- 
ferent Li content’’. The primary is on the Spite plateau, whereas the 
secondary is below it, at A(Li) = 1.8. The reasons for this meltdown are 
not understood. It has been suggested" that a Li depletion mechanism, 
whose efficiency depends on metallicity and temperature, could 
explain the observations. If this were the case, the Li abundance in 
SDSS J102915+ 172927 would result from efficient Li depletion due to 
a combination of extremely low metallicity and relatively low temper- 
ature. For completeness, we mention that there are a small number of 
known stars which have a metallicity and effective temperature similar 
to that of other stars on the Spite plateau, but where the Li doublet is 
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Figure 2 | Lithium abundance of SDSS J102915+172927 compared to that 
of other metal-poor stars. a, b, The Spite plateau is shown as a function of iron 
abundance, [Fe/H] (a), and of carbon abundance, [C/H] (b). We use carbon 
and iron in turn as proxies of the metallicity Z. The upper limits for SDSS 
J102915+ 172927 are from the present work. The other filled black circles are 
from refs 3 and 18, the upper limit for HE 1327—2326 is from ref. 4. The other 
open black circles are the upper limits and are mentioned below. The Li 
measurements for the binary system CS 22876-32 are from ref. 19, the upper 
limits for the three well known Li-depleted dwarfs, G 122-69, G 139-8 and G 
186-26, are from ref. 20. For these three last stars, as well as for CS 22876-32, we 
have no measurement of the C abundances, and we therefore assumed that C 
scales with iron as in the rest of the sample’. The precise placement of these stars 
along the abscissa in this diagram is of no consequence for the present 
discussion. 


not detected. The fact that such stars are found for different values of 
[Fe/H] and [C/H] suggests that Li-depletion is independent of either. It 
has been suggested that Li-depleted stars could have a common origin 
with blue stragglers”, an interpretation that has been reinforced by the 
discovery that these stars are also depleted in beryllium”. 

Stars similar to SDSS J102915+ 172927 are probably not very rare. 
Only 30% of the whole SDSS survey area was accessible to our VLT 
observations. We identified 2,899 potentially extreme stars with metal- 
licity less than Z<1.1X 10° in Data Release 7°. Among those 
observable with the VLT, we performed a subjective selection of the 
most promising candidates; of these, we observed six in our X-Shooter 
programme, resulting in one detection. Depending on the subjective 
bias we attribute to the last selection step, we expect 5-50 stars of 
similar or even lower metallicity than SDSS J102915+172927 to be 
found among the candidates accessible from the VLT, and many more 
in the whole SDSS sample. 
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Chronological evidence that the Moon is either young 
or did not have a global magma ocean 


Lars E. Borg', James N. Connelly”, Maud Boyet? & Richard W. Carlson* 


Chemical evolution of planetary bodies, ranging from asteroids to 
the large rocky planets, is thought to begin with differentiation 
through solidification of magma oceans many hundreds of kilo- 
metres in depth'*. The Earth’s Moon is the archetypical example 
of this type of differentiation. Evidence for a lunar magma ocean is 
derived largely from the widespread distribution, compositional 
and mineralogical characteristics, and ancient ages inferred for 
the ferroan anorthosite (FAN) suite of lunar crustal rocks. The 
FANs are considered to be primary lunar flotation-cumulate crust 
that crystallized in the latter stages of magma ocean solidification. 
According to this theory, FANs represent the oldest lunar crustal 
rock type**. Attempts to date this rock suite have yielded ambigu- 
ous results, however, because individual isochron measurements 
are typically incompatible with the geochemical make-up of the 
samples, and have not been confirmed by additional isotopic sys- 
tems*°. By making improvements to the standard isotopic tech- 
niques, we report here the age of crystallization of FAN 60025 
using the *°’Pb-”°Pb, '*’Sm-"“*Nd and '“°Sm-'“’Nd isotopic sys- 
tems to be 4,360 + 3 million years. This extraordinarily young age 
requires that either the Moon solidified significantly later than most 
previous estimates or the long-held assumption that FANs are flo- 
tation cumulates of a primordial magma ocean is incorrect. If the 
latter is correct, then much of the lunar crust may have been pro- 
duced by non-magma-ocean processes, such as serial magmatism”®. 

To constrain the age of a FAN using multiple chronometers we (1) 
obtained a unique mafic-mineral-rich sample of FAN 60025 found by 
examining the lunar collection at the Johnson Space Center, (2) 
developed a new methodology for Pb-Pb chronometry in which sam- 
ples are washed numerous times to remove Pb contamination, and (3) 
produced a 99.988% pure '°°Nd spike that allowed parent/daughter 
ratios and isotopic compositions for both '*°Sm-'**Nd and 
“7S$m-'“*Nd chronometry to be measured in the same mineral frac- 
tions. This FAN was selected because it is coarse-grained and contains 
very low abundances of siderophile elements, indicating that it has 
been minimally altered by impact processes'?'”. Furthermore, it has 
experienced a minimal flux of thermal neutrons, eliminating the need 
for corrections on Sm and Nd isotopic measurements”’’. Isotopic 
measurements were completed using large (about 500 mg) fractions 
of mafic minerals (mostly pyroxene) and plagioclase (see Supplemen- 
tary Information). 

The Pb-Pb and ‘“’Sm—'*°Nd ages for 60025 are concordant and have 
a weighted average value of 4,360 + 3 million years (Myr). This is the first 
study in which a single FAN has yielded consistent ages from multiple 
isotopic systems, and strongly suggests that the ages record the time at 
which the rock crystallized. The Pb—-Pb age was determined on sequential 
dissolutions of a 105.4-mg pyroxene fraction. Four of five dissolution 
steps of this pyroxene mineral split define a line corresponding to an age 
of 4,359.2 + 2.4 Myr (Fig. 1). Several mineral and whole-rock fractions 
analysed previously*'*"* also plot on the 4,359.2 + 2.4 Myr line but show 
limited Pb isotopic variation. We interpret the line defined by four 


dissolution steps of the pyroxene mineral splits to represent a mixture 
of initial Pb and radiogenic Pb that evolved in a closed system. We note 
that the Px-2 (L3) fraction has a *“*Pb/*°°Pb ratio of 0.000774, indi- 
cating that it is dominated by radiogenic Pb, and consequently defines 
the y intercept on Fig. 1. This is by far the most radiogenic Pb measured 
in 60025 so far. For example, in a previous detailed study of 60025, the 
most radiogenic Pb reported was 204Db/?°°Pb = 0.0151 (ref. 6). The 
radiogenic *°’Pb/*°°Pb ratio defined by the y intercept is thus well 
constrained and corresponds to the age of last closure of this system. 

The washing and dissolution procedures used to define the Pb-Pb 
isochron were developed using small (about 10mg) fractions of 
pyroxene and plagioclase as test samples. Although the Pb isotopic 
compositions of these fractions were analysed to assess the efficiency of 
the procedure, relatively large blank corrections precluded their use in 
defining the age of 60025. The isotopic compositions of the washes 
demonstrate that most fall below the isochron towards modern 
terrestrial Pb. The Px-2 (L2) 1 M hydrofluoric acid (HF) acid fraction 
also lies towards terrestrial Pb on Fig. 1. Sequential washing and dis- 
solution of primitive chondritic meteorites demonstrates that the first 
HF digestion step often liberates a large amount of terrestrial Pb con- 
tamination’’. This probably reflects dissolution of small particles, such 
as dust, in the weak HF acid, which were not removed in the washing 
steps. It is therefore not surprising that the Px-2 (L2) fraction lies off 
the isochron towards terrestrial contamination, which justifies remov- 
ing this fraction from the age calculation. However, including the Px-2 
(L2) fraction in the regression only changes the calculated age by about 
10 Myr. A few fractions fall above the isochron as well, and are inter- 
preted to have a well known, if poorly understood, lunar Pb con- 
taminant'*!>’” (see Supplementary Information). 
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Figure 1 | Pb-Pb isochron diagram. For FAN 60025, an age of 

4,359.2 + 2.4 Myris defined by sequential dissolutions ofa 105.4-mg split of the 
pyroxene mineral fraction. Uncertainties are +20 mean of population of mass 
spectrometry ratios plus 50% uncertainty associated with a blank. The mean 
squared weighted deviation (MSWD) is 1.6. 
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Voldgade 5-7 Copenhagen, Denmark. 2Clermont Université, Université Blaise Pascal, Laboratoire Magmas et Volcans, UMR CNRS 6524, 5 rue Kessler, 63038 Clermont-Ferrand, France. “Department of 
Terrestrial Magnetism, Carnegie Institution, 5241 Broad Branch Road, Northwest, Washington DC 20015-1305, USA. 
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The Sm-Nd isotopic data define a '*’Sm-'*°Nd isochron age of 


4,367 + 11 Myr (Fig. 2a) that is 73 Myr younger than that determined 
previously for 60025 (ref. 5). Recalculating the data of ref. 5 using the 
IsoPlot program (used to calculate the ages reported here) demon- 
strates that the two ages differ by only 22 Myr. We note that the much 
larger sample sizes used here allow significantly higher precision as a 
result of running Nd as a metal instead of an oxide, as well as minimiz- 
ing concerns about blank correction (see Supplementary Information). 
These factors, plus the concordance of Sm-—Nd and Pb-Pb ages, suggest 
that the new result improves on the accuracy of earlier work. In addi- 
tion, we have produced the first lunar 146¢m-!”Nd internal isochron, 
yielding an age of 4,318*3% Myr (Fig. 2b). Although this age is 41- 
49 Myr younger than the Bsn “°Nd and Pb-Pb ages, its relatively 
large uncertainty makes it differ from the other ages by only 8-9 Myr. 
The minor discordance between chronometers probably reflects the 
small range in measured '*7Nd/'*Nd (48 parts per million) anda slight 
underestimation of the analytical uncertainty on the measured 
7$m/"“4Nd and/or '“’Nd/'“4Nd ratios. This is illustrated in Fig. 2b 
by the position of the 4,359 Pb-Pb reference line. Regardless of the 
cause of the very small discrepancy between Sm—Nd ages, the extremely 
limited variation in '*7Nd/'*Nd between the mineral separates is con- 
sistent with the young age determined for 60025. 

If 60025 represents a flotation cumulate of the lunar magma ocean, 
its young crystallization age requires the solidification of the magma 
ocean, and formation of the Moon, to have occurred later than most 
previous estimates'**°. A young Moon is supported by the observation 
that almost all lunar crustal rocks!*!’, as well as two FANs®*”, are the 
same age or younger than 60025. Although model-dependent, 
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Figure 2 | Sm-Nd isochron diagrams. For FAN 60025, plagioclase, pyroxene 
and whole rocks define a '*’Sm-'*°Nd age of 4,367 + 11 Myr (a) anda 
M46om-?Nd age of 4,318 * 3 Myr (b). a, ENd = —0.24 + 0.09 (MSWD = 0.40). 
The inset to a represents the deviation of individual points from the isochron in 
epsilon units. b, e!?nNdCHUR= +0.10 £0.04, where CHUR is chondritic 
uniform reservoir (MSWD = 0.84). 4°Sm/'“4Nd = 0.0085 at 4,568 Myr ago. 
The inset to b represents deviations of lunar whole rocks analysed in refs 13 and 
21-23 from the isochron. Uncertainties are +2o¢ mean of the population of 
mass spectrometry ratios. 
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'2Nd/'4Nd ages for the formation of mer basalt sauiee regions are 
also young'*!~, ranging from 4,313+ 33 Myr to 4,353+33 Myr, andare 
concordant with the '°Nd/'**Nd mode age” of 4 360 + 60 Myr 
for the last solidification products of the magma ocean (KREEP: 
potassium, rare-earth elements and phosphate-rich cumulates). 
Recent ‘*’Hf-'**W measurements have shown no radiogenic input 

o '*’w in lunar samples and constrain differentiation of the lunar 
interior to have occurred after 4.5 billion years (Gyr) ago”’. The new 
data for 60025 are thus consistent with formation of the Moon up to 
about 200 Myr after that of the Solar System. 

The oldest terrestrial samples are zircons found in metasedimentary 
deposits from Jack Hills, Australia, where a single zircon has yielded an 
age of 4,404 + 8 Myr (ref. 26). It has REE abundances and an oxygen 
isotopic composition that indicates it formed from an evolved granitic 
magma and represents a fragment of continental crust. Taken at face 
value, the ages of this zircon and 60025 require differentiation on the 
Moon to occur at least 30 Myr after the differentiation of Earth. This 
implies that the Moon accreted relatively slowly after the giant impact, 
or that the Moon retained enough heat to delay cumulate formation. 

Alternatively, the young age of 60025 might indicate that it is not a 
magma ocean product, but rather was produced bya more recent melting 
event. It should be noted that Sm—Nd ages older than 60025 (Fig. 3) have 
been reported for three FANs, Y-86032 (4.43 + 0.03 Gyr; ref. 27), 67075 
(4.47 + 0.07 Gyr; ref. 28) and 67016c (4.53 + 0.12 Gyr; ref. 10), and one 
Mg suite sample 15445,17 (4.46 + 0.07 Gyr; ref. 29). Similarly the oldest 
lunar zircon, dated at 4,417 + 6 Myr (ref. 20), comes from the impact 
melt-breccia 72215 and presumably dates the time of crystallization of a 
KREEP-rich alkali-suite sample. Finally, some KREEP model ages'*'?”* 
are older than 60025, averaging 4,456 + 65 Myr. Therefore, if 60025 is a 
flotation cumulate of a magma ocean, these old ages must be in error. 

The overlap between the '*°Sm-—'*’Nd isotopic systematics of 60025 
and those inferred for mare basalt source regions'**” (Fig. 2b) places 
additional constraints on FAN petrogenesis. It suggests that the mare 
basalt sources and FANs formed with very similar initial Nd isotopic 
compositions at roughly the same time. Admittedly this overlap is not 
perfect, but is nonetheless remarkable given that the basalt Nd isotopic 
measurements were completed under different analytical conditions and 
variably corrected for thermal neutron capture, and the 147Sm/'4Nd 
ratios were calculated using different geochemical models. Geological 
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Figure 3 | Summary of lunar ages. The diagram illustrates the ages of the 
oldest lunar samples and model ages of cumulate source regions. Symbols refer 
to ages reported on individual samples. Filled circles represent FANs, open 
squares represent Alkali and Mg-suite samples, and filled triangles represent 
model ages. Error bars are uncertainties reported for individual age 
determinations. Ages to the right of the dashed line are inconsistent with 60025 
representing a flotation cumulate of a lunar magma ocean. Data are referenced 
in the text and the Supplementary Information. 
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scenarios for the formation of 60025 must therefore include the forma- 
tion of the low-Ti (pyroxene-rich), high-Ti (ilmenite-rich) and 
KREEP-rich source regions of the mare basalts. The success of the 
magma ocean model of lunar differentiation stems largely from its 
ability to do just this: account for the petrogenesis of all types of lunar 
source regions by a single mechanism (see, for example, ref. 4). Thus, if 
60025 is not a magma ocean cumulate, then another mechanism to re- 
equilibrate '“°Sm-'**Nd isotopic systematics of mafic and plagioclase- 
rich cumulates is required. 

One possible mechanism to equilibrate ancient lunar source regions 
without invoking magma ocean crystallization is for early-formed 
cumulates to have overturned as a result of density instability of the 
cumulate pile (for example, ref. 30). In this scenario of serial magmatism, 
young FANs represent solidification products of plagioclase-rich melts 
produced during overturn, whereas mare basalts represent later, more 
mafic, melts of the same cumulate sources. The advantage of this 
scenario is that it does not require dismissing the handful of older ages 
determined on lunar samples and it can potentially account for the 
variable (if somewhat suspect) ages and widespread distribution of 
FANs on the lunar surface. The disadvantage is that the lunar magma 
ocean theory is strongly founded upon the petrological, geochemical and 
isotopic characteristics of FANs such as 60025. Therefore, if 60025 is not 
a product of the magma ocean, petrologically similar rocks from the 
FAN suite cannot be used to characterize its initial solidification. 
Furthermore, the potential for planetary differentiation by magma 
ocean solidification on all rocky bodies is weakened if the very rocks 
that led to the development of the magma ocean theory are themselves 
not its byproduct. 


METHODS SUMMARY 


Thorough examination of the Apollo sample collection at the Johnson Space 
Center yielded a 1.88-g pyroxene-rich clast from FAN 60025. It contained 25% 
mafic minerals (mostly pyroxene) that were separated from plagioclase using a 
Frantz magnetic separator and hand-picking. Ten milligrams of pyroxene and 
plagioclase were used to develop washing and sequential dissolution procedures 
designed to remove Pb contamination from the splits. The 10-mg mineral splits 
were washed in distilled water, ethanol and acetone, and then in weak hydro- 
bromic acid (HBr). They were then sequentially dissolved using a variety of pro- 
gressively stronger acids. The effectiveness of washing and sequential dissolution 
to remove Pb contamination was assessed by determining the Pb isotopic com- 
position of individual washes and dissolutions. These analyses demonstrated that 
only the pyroxene fraction was suitable for age dating. The washing/sequential 
dissolution procedure developed for the 10-mg pyroxene split was applied to a 
larger 105.4-mg pyroxene split. The pyroxene was digested in HF and hydro- 
chloric acid (HCI), spiked with a mixed ?02Db-*°Pb tracer, and Pb was purified 
using HBr-HNO; based chemistry. Lead was run on a Thermo-Fisher Triton by 
peak hopping in a secondary electron multiplier. Samarium and neodymium were 
measured on the remaining pyroxene and plagioclase fractions as well as an 
additional whole-rock fraction. The samples were digested in HF and nitric acid 
(HNO3) and spiked with a ‘’Sm-'°°Nd tracer prepared at Lawrence Livermore 
National Laboratory and having 99.988% '°°Nd. REEs were separated using HCl 
and methalactic acid. Total Sm and Nd procedural blanks were 8 pg and 40 pg 
respectively. The isotopic compositions were determined with a Thermo-Fisher 
Triton at the Department of Terrestrial Magnetism using double Re filaments. 
Neodymium data were collected dynamically using 8-s integrations. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Sample preparation. L.E.B. visited the lunar curation facility at the Johnson Space 
Center (JSC) in October 2008 and examined several pieces of FAN 60025 search- 
ing for a clast that was enriched in mafic minerals. The sample allocated at the JSC 
weighed 1.88 g and contained approximately 25% mafic minerals consisting of 
olivine and pyroxene intergrown with large grains of plagioclase. The sample was 
crushed into coarse fragments, creating some fine powder. Several large fragments 
were set aside. Mafic and plagioclase fragments were high-graded into two frac- 
tions using tweezers. There remained a third fraction that was a 50/50 mixture of 
mafic minerals and plagioclase, but was too fine-grained to be high-graded using 
the tweezers. The coarse mafic fraction and the 50/50 mixture were crushed in a 
sapphire mortar and pestle and sieved yielding fractions of 75-200, 200-325 and 
<325 mesh. Mineral separations were completed on these fractions using a Frantz 
isobarrier separator and handpicking under dry conditions to minimize Pb con- 
tamination during processing. Pyroxene was also separated from the <325 mesh 
split of the coarse-grained mafic fraction but was too fine-grained to be hand- 
picked. Visual inspection of the mafic mineral and plagioclase fractions indicated 
that they were >99% pure. 

Pb procedures. The Pb isotopic data were collected in three analytical sessions. The 
first two sessions were preliminary and used to develop washing and sequential 
multi-step dissolution procedures to remove contamination from 10-mg splits of 
plagioclase and pyroxene mineral fractions. The Pb isotopic compositions of indi- 
vidual washes were analysed throughout the washing procedure to determine how 
much Pb contamination was removed during each washing step, and to assess the 
next step in the procedure. Once we were convinced that significant quantities of 
contamination had been removed, sequential digestion of the samples was begun. 
Preliminary Pb isotopic analysis of wash and sequential digestion fractions of the 
10-mg plagioclase split demonstrated that not enough Pb contamination was 
removed to warrant additional analysis of plagioclase in 60025. The third and final 
analytical session therefore focused on a 105.4-mg split of the pyroxene fraction. 
Washing and sequential digestion procedures used in the third session were similar, 
but not identical, to those used in the preliminary session on the 10-mg pyroxene 
split. We note that the sequential digestions of the 105.4-mg pyroxene fraction were 
the only dissolutions designed to define the age of the sample. Nevertheless, the 
isotopic compositions of all of the preliminary washes and digestions of the plagio- 
clase and both pyroxene splits are reported in the Supplementary Information. 

The fractions for all sessions were pre-washed in five cycles of distilled water, 
ethanol and acetone before progressive weak-acid washing. The Pb isotopic com- 
position of these washes was not determined. Next, the 10-mg splits were washed 
in water followed by six washes in very weak HBr. The final wash was in 0.5 M HBr. 
After washing, the 10-mg plagioclase and pyroxene splits were sequentially dis- 
solved in a multi-step procedure that used progressively stronger acids. Despite the 
small amount of Pb in the final dissolution steps of the 10-mg pyroxene split, the 
analyses of the washes and sequential dissolutions of this fraction showed two 
encouraging features for geochronology: (1) as cleaning progressed, the Pb from 
the residual material became more radiogenic, and (2) the analyses of the sequential 
dissolutions defined an approximately linear trend despite a significant correction 
for Pb blank on most analyses. 

From this information it was clear that the pyroxene required extensive cleaning 
in water, acetone, ethanol and weak acids, with a final cleaning step using 0.5 M HBr 
before starting the multi-step dissolution. The dissolution procedure successively 
used 6M HCl, 1M HF, 7M HF, 7M HNO;, and 28M HF + 14M HNO; to 
produce the Px-2 (L1)-(L5) fractions, respectively. Lead was separated from matrix 
elements by passing the samples twice through a HBr-HNO3-based chemical 
separation procedure using 0.055 ml of Eichrom anion resin'®. The purified Pb 
was analysed on a Thermo-Fisher Triton thermal ionization mass spectrometer 
equipped with nine Faraday detectors and one axial secondary-electron-multiplier 
ion-counting system. Lead was loaded onto previously outgassed zone-refined Re 
filaments with silica gel made from silicic acid following ref. 31. All analyses were 
made by sequentially peak jumping the ion beams into the central secondary- 
electron-multiplier ion-counting system. Samples were spiked with a *°°Pb-*”°Pb 
double spike to allow for an internal correction for instrument mass fractionation. 
Samples are corrected for a Pb blank added during the chemical separation accord- 
ing to replicate blanks run during the same session. Lead blanks from the 
chemistry during this work ranged from 0.25 to 0.50 pg. Samples are also corrected 
for a 0.2-pg Pb loading blank. The data with 20 errors are presented in the 
Supplementary Information. 

Sm-Nd procedures. Samples for Sm-Nd analyses were derived from the aliquots 
of the plagioclase and whole-rock fractions used for Pb-Pb analyses, plus an 
additional whole-rock fraction. The samples were dissolved in 3:1 mixtures of 
concentrated HF:HNO; in sealed Savillex beakers on a hot plate at 90 °C. Before 
dissolution, the samples were spiked with M49¢m and °°Nd, for concentration 
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determination. A new mixed spike consisting of 99.988% purity °°Nd and 
97.7% '*°Sm was prepared at Lawrence Livermore National Laboratory for this 
work and calibrated against the AMES metal Sm—Nd mixed standard described in 
ref. 32. After three days on the hot plate, the samples were evaporated to dryness, 
treated twice with concentrated HNO;, drying in between treatments, and then 
dissolved in 6 M HCl. The volume of HC] was increased until a clear solution was 
obtained. Once complete dissolution was achieved, the samples were dried and 
then redissolved in 5 ml of 1 M HCl + 0.1 M HE. The samples were centrifuged to 
remove any undissolved material. The clear liquid was loaded on 20-cm-long, 
1-cm-diameter quartz columns filled with AG50W-X8 resin. High-field-strength 
elements were eluted in the sample loading solution and an additional 5 ml of 
1 MHCI/0.1 MHF. Major elements and large ion lithophile trace elements were 
then eluted in 2.5 M HC] with heavy-REE and light-REE splits eluted in 4M HCl. 
The light-REE split was processed further to separate Sm and Nd using the pro- 
cedure described most recently in ref. 33. The column procedure used here has a 
total blank of 12 pg Nd and 2.3 pg Sm. The sample data were corrected using 
blanks of 40pg Nd and 8 pg Sm to account for the volume of acid used for 
dissolution and because each sample was split between two columns for the first 
step of the chemical separation. The smallest Sm—Nd amounts analysed here were 
the 89 ng of Nd obtained from the mafic (olivine + pyroxene) mineral split and the 
40 ng of Sm extracted from the plagioclase. The blanks listed above thus constitute 
less than 0.05% of the analysed Sm and Nd and are negligible. Concentration data 
are presented in the Supplementary Information. 

Neodymium was loaded in 3M HCI onto a Re filament and analysed as Nd* 
using a second Re filament for ionization. Isotope ratios were measured with the 
Department of Terrestrial Magnetism’s Thermo-Fisher Triton thermal ionization 
mass spectrometer using a two-mass-step procedure with '*°Nd and then '*Nd in 
the axial Faraday detector. This procedure calculates '**Nd/'“4Nd dynamically to 
eliminate Faraday detector biases. All other Nd-isotope ratios are measured statically 
along with potential interfering species Ce and Sm. Each step integrates the signal for 
8s, leading to 16s of signal integration for each ratio except for “’Nd/'**Nd and 
»°Nd/'4Nd that have 8s of signal integration per ratio. For the plagioclase, 480 
ratios were obtained at an average '“*Nd = 1.2 X 10 '' A. The pyroxene produced 
420 ratios at an average '47Nd = 1.6 X 10"! A. The whole-rock analysis lasted only 
for 130 ratios at an average '“’Nd = 8.5 X 10°!” A. Errors reported for the sample 
measurements are the 2o-mean from the individual mass spectrometer runs. All 
data are corrected ratio by ratio during mass spectrometry for Ce interference (using 
42Ce/!°Ce = 0.125) and Sm interference (using the measured Sm isotopic com- 
position of each sample). Six 540 ratio runs of the JNdi Nd standard at signal sizes of 
2-5 X 10 '' A were run in the same barrel as the sample analyses. The results of 
these standard runs are listed in the Supplementary Information along with the 
isotopic compositions measured for each sample. Because the samples were 
total-spiked, the '°°Nd/'**Nd and '*°Sm/'*?Sm ratios are considerably higher than 
standard Nd and Sm. All Nd ratios except for '“°Nd/'**Nd are corrected for mass 
fractionation to '“°Nd/'“*Nd = 0.7219, using exponential mass dependency. All 
but '*°Nd/'4Nd and '°°Nd/'“*Nd are corrected for the small spike contribution to 
these isotopes from the highly enriched '°°Nd spike. The '°Nd/'**Nd ratio 
reported in the Supplementary Information is the measured value before fractiona- 
tion and spike correction. 

Samarium was loaded in 2M HNO; onto a Re filament and analysed as Sm* 
using a second Re filament for ionization. All Sm isotopes along with potential 
interference from Nd (1“°Nd) were measured statically for 180 ratios of 8-s integ- 
ration each. Corrections for Nd interference on Sm were made using the measured 
Nd isotopic composition of each sample. Sm concentrations were calculated 
assuming normal Sm as previous measurements of unspiked Sm from 60025 show 
no resolvable deficit in “°Sm that could be attributable to neutron capture!**. 
Both the whole-rock and mafic fractions measured here have spike-corrected 
*°Sm/"*?Sm ratios only marginally higher than normal Sm, in agreement with 
previous unspiked Sm measurements'*™ in different samples of 60025. The Sm 
isotopic compositions are corrected for fractionation to 76m/'>*Sm = 0.56081 
with the exception of the '*’Sm/'*?Sm ratio, which is reported as measured in the 
Supplementary Information. All ratios except '*°Sm/'°*Sm and ™’Sm/'**Sm 
reported in the Supplementary Information have also been corrected for the 
contribution from the '*’Sm spike. 
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Real-time quantum feedback prepares and stabilizes 


photon number states 
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Feedback loops are central to most classical control procedures. A 
controller compares the signal measured by a sensor (system output) 
with the target value or set-point. It then adjusts an actuator (system 
input) to stabilize the signal around the target value. Generalizing 
this scheme to stabilize a micro-system’s quantum state relies on 
quantum feedback'°, which must overcome a fundamental dif- 
ficulty: the sensor measurements cause a random back-action on 
the system. An optimal compromise uses weak measurements*”’, 
providing partial information with minimal perturbation. The con- 
troller should include the effect of this perturbation in the computa- 
tion of the actuator’s operation, which brings the incrementally 
perturbed state closer to the target. Although some aspects of this 
scenario have been experimentally demonstrated for the control of 
quantum*” or classical’®"' micro-system variables, continuous feed- 
back loop operations that permanently stabilize quantum systems 
around a target state have not yet been realized. Here we have imple- 
mented such a real-time stabilizing quantum feedback scheme’? fol- 
lowing a method inspired by ref. 13. It prepares on demand photon 
number states (Fock states) of a microwave field in a superconduct- 
ing cavity, and subsequently reverses the effects of decoherence- 
induced field quantum jumps'*”*. The sensor is a beam of atoms 
crossing the cavity, which repeatedly performs weak quantum non- 
demolition measurements of the photon number™. The controller is 
implemented in a real-time computer commanding the actuator, 
which injects adjusted small classical fields into the cavity between 
measurements. The microwave field is a quantum oscillator usable as 
a quantum memory” or as a quantum bus swapping information 
between atoms’. Our experiment demonstrates that active control 
can generate non-classical states of this oscillator and combat their 
decoherence’*”’, and is a significant step towards the implementa- 
tion of complex quantum information operations. 

A Fock state with n photons is hard to generate and very fragile. 
Prepared in a cavity of damping time T,, it survives on average for 
T./n before undergoing a quantum jump towards the |n — 1) Fock state. 
In contrast, classical Glauber states’’, which are coherent superpositions 
of Fock states with an average photon number 71 and a Poisson photon 
number probability distribution P(n) = exp(—n) (n”/n!), are much 
easier to prepare and more robust. Glauber states are easily obtained 
by coupling the initially empty cavity to a classical field source for a fixed 
amount of time. This operation amounts to the translation of the field in 
its phase space from the vacuum (7 = 0 coherent state) to a final coherent 
state having an amplitude « = \/7p with a mean photon number fio. 
After the source is switched off, the field remains in a coherent state with 
an exponentially decaying amplitude, n becoming n(t) = noexp(—t/T.). 

Experimental methods to prepare Fock states in a cavity C start from a 
coherent state and exploit the coupling of the field to two-level 
qubits’*’°*", A deterministic procedure feeds quanta one at a time into 
the field initially in vacuum by swapping its energy with a qubit 
periodically re-pumped into its excited state’’. This method, which has 


been generalized to synthesize arbitrary superpositions of Fock states”, 
cannot counteract decoherence because it does not provide real time 
information on the actual field state in C. Fock states can also be prepared 
by a quantum non-demolition (QND) measurement performed on an 
initial coherent state with 79% 0 (ref. 14). Atomic qubits probe the field 
one at a time and the photon number is progressively pinned down to an 
inherently random value, the probability P(m) for finding n being the value 
corresponding to the initial coherent field. This QND method provides 
real time information about the field state history during the process. This 
information can be used for a deterministic steering of the field towards a 
target Fock state |n,), as well as for detection and subsequent correction of 
quantum jump events. We have performed a quantum feedback experi- 
ment by combining the detection of successive atoms with field phase- 
space translations of controlled amplitudes. We thus prepare Fock states 
|n,) on demand and, on average, stabilize them by bringing the field back 
into them after decoherence-induced quantum jumps. 

The experiment is performed in a superconducting cavity C with 
T, = 65 ms cooled to 0.8 K (see Fig. 1 and Supplementary Methods). It 
is initially fed by the source S which prepares a coherent state with a real 
amplitude «, = ,/n. The quantum sensors are circular Rydberg atoms 
prepared in B at regular time intervals (T, = 82 1s)'**’. The number of 
Rydberg atoms in each sample obeys Poisson statistics, with 0.6 atoms 
per sample on average. The atomic states |g) and |e) with principal 
quantum numbers 50 and 51 are the 0 and | states of a qubit slightly 
off-resonant with cavity C (atom-cavity detuning 6/2m = 245 kHz). 
The qubit coherence undergoes in C a light-induced phase shift linear 
in the photon number (phase-shift per photon # = 0.2567). This 
phase shift is measured by a Ramsey interferometer (R; and R)). 
Detecting each atomic sample in D provides partial information about 
the number of photons in C. 

Each iteration of the feedback loop” consists of a sample detection 
by the detector D, a cavity field state estimation by the controller K and 
a field translation performed by the actuator S (Fig. 1). In each itera- 
tion, K first updates its estimation of the field density operator p on the 
basis of the detection outcome, and corrects this estimation by taking 
into account the effect of cavity relaxation at finite temperature during 
the iteration time T,. It then computes the amplitude « of the trans- 
lation described by the operator D(«) = exp(aa' — «*a) (here a is the 
photon annihilation operator, + and * denote Hermitian and complex 
conjugate, respectively). Because the initial and target density opera- 
tors are real, we restrict the translations to real values of «. The field 
translation minimizes a proper ‘distance’ d(p,, D(a) oD(—«)) (defined 
below) between the displaced state and the target state p, = |n,(nJ. 
Finally, at the end of each feedback loop iteration, K calculates the 
translated field’s state, which is to be used at the beginning of the next 
iteration. Note that this quantum state estimation, performed on a 
single quantum trajectory, cannot be obtained from the measurement 
data only. It also incorporates all available information on the state 
preparation, displacements and relaxation. 
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Figure 1 | Scheme of the quantum feedback set-up. An atomic Ramsey 
interferometer (auxiliary cavities R; and R,) sandwiches the superconducting 
Fabry-Perot cavity C resonant at 51 GHz and cooled to 0.8 K (the mean 
number of blackbody photons is 0.05). The pulsed classical source S’ induces 
1/2 pulses resonant with the |g)— |e) transition in R, and Ry (with relative 
phase ¢#,) on the velocity-selected (250m s') Rydberg atom qubits (purple 
circles) prepared by laser excitation (blue arrow) from a rubidium atomic beam 


(green arrow) in B. The field-ionization detector D measures the qubits in the 
e/g basis with a 35% detection efficiency and an error rate of a few per cent 
(Supplementary Methods). The actuator S feeds cavity C by diffraction on the 
mirror edges. The controller K (a CPU-based ADwin Pro-II system) collects 
information from D to determine the real translation amplitude « applied by S. 
It sets the S-pulse duration through a PIN diode switch A (63-1s pulse for 

|x| = 0.1) as well as a 180° phase-shifter ® controlling the sign of «. 
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Figure 2 | Individual quantum feedback trajectories. Two feedback runs 
lasting 164 ms (2,000 loop iterations) stabilizing |r = 2) (left column) and 
|n, = 3) (right column). The phase-shift per photon, #9 = 0.256n, allows 
controller K to discriminate n values between 0 and 7. For n, = 2, the Ramsey 
phase is $, = —0.44 rad, corresponding to nearly equal e and g detection 
probabilities when n = 2. For n, = 3, two Ramsey phases ¢,,; = —0.44 rad and 
,,2 = —1.24 rad are alternatively used, corresponding to equal e and g 
probabilities when n = 2 and n = 3, respectively. a, Sequences of qubit 
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detection outcomes. The detection results are shown as blue downward bars for 
g and red upward bars for e. Two-atom detections in the same state appear as 
double-length bars. b, Estimated distance between the target and the actual 
state. c, Applied «-corrections (shown on a log scale as sgn(«)log|«|). d, Photon 
number probabilities estimated by K: P(n = n,) is in green, P(m < m,) in red, 
P(n > n,) in blue. e, Field density operators p in the Fock-state basis estimated 
by K at four different times marked by arrows. 
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Figure 3 | Photon number histograms following quantum feedback 
iterations. Plots a, b, c and d correspond to the target photon number states 
n, = 1, 2, 3 and 4, respectively. The red histograms correspond to about 3,900 
trajectories stopped when P(n,) has reached the threshold value (0.8) for three 
successive iterations. These histograms describe the field at the time when the 
controller K has certified the ‘success’ of the quantum feedback procedure. The 
blue histograms correspond to 4,000 trajectories stopped at a fixed time 

(164 ms) and describe the feedback procedure steady-state. These histograms 
are reconstructed by a method independent of the feedback estimator. After 
interrupting the feedback, we record ten additional QND qubit samples 

(~2 detected atoms) with a Ramsey interferometer phase #, chosen in sequence 
among 4 values (#, = 1.17, 0.36, —0.44 and — 1.24 rad). From these additional 
qubit detections, we reconstruct the final Pgnp(m) distribution for each 
ensemble of trajectories by a maximum likelihood algorithm. Statistical error of 
the reconstructed Panp(n) for different target states is about 0.01-0.02 for 
n=n, and n, + 1, and it is significantly smaller than 0.01 for other photon 
numbers (see Supplementary Methods). The green histograms give the initial 
coherent state photon number distributions (a similar reconstruction was 
performed with a fixed time stop immediately after initial field injection). 


In an ideal experiment, with exactly one atom prepared and perfectly 
detected in each sample, a detection in |e) or |g) would actualize the state 
estimation by the mapping p—> M,pM;"/Tr( pM,;'M,) (j=eg), with 
M. = cos[(?, + Po(N+ 2))/2] and M,=sin[(¢, + d(N+ %))/2] 
where ¢, is the tunable phase of the Ramsey interferometer and 
N=a‘a the photon number operator. This qubit detection is a weak 
measurement of N associated with the positive operator valued measure 
(POVM), I; = M,'M,j. In the actual experiment, the measurement- 
induced state mapping takes into account all known and independently 
measured imperfections: the possibility of 0 and 2 atoms in atomic 
samples, finite detection efficiency and wrong atomic state assignment 
(see Supplementary Methods for details). If, for instance, no detection 
occurs, there is a probability that no atom was present in the sample, in 
which case the field state does not change. There is another probability 
that the detector has failed to detect a single qubit, in which case the field 
should be updated according to the mapping p> = ,MjpM;'. It is also 
possible that the detector has missed two qubits, in which case the 
updating would be p > 2j7MjM;j pM,'M;* (j,j’ = e.g). The probabilities 
that these situations have occurred, conditioned to the fact that no 
detection was made, are obtained by a classical Bayesian inference argu- 
ment. Similar Bayesian reasonings are used to infer the probabilities 
which affect the mapping when one or two qubits are detected. The state 
estimation also takes into account the back-action on the field of the yet 
undetected samples which are on their 344-1s-long flight from C to D 
(Fig. 1). 

The control law relies on a Lyapunov-based state stabilization™. Its 
efficiency depends upon the definition of the distance d(p,p) (the 
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Figure 4 | Performance of the quantum feedback procedure. a, Time 
evolution of the fraction of individual field trajectories, C;,(t), that have reached 
the fidelity threshold (0.8) while converging towards |n, = 3) in quantum 
feedback sequences (smooth line) and in passive QND ‘trials’ (stepped line). 
Statistics performed over 4,000 and 2,131 trajectories, respectively. The same 
Ramsey phase settings as in Fig. 2 have been used for both feedback and QND 
sequences. b, Recovery from a quantum jump: the lower plot shows probabilities 
P(n,t) estimated by K and averaged over 2,561 trajectories, following the 
preparation at t = 0 of the Fock state |n = 2) by a QND measurement of an 
initial coherent state (colour code shown at right for the different values of 1). 
The Ramsey phase settings are the same as in Fig. 2 for n, = 3.The initial field 
density matrix of the field estimation algorithm is diagonal and corresponds to 
the red histogram in Fig. 3c. The experiment thus simulates the reaction of the 
quantum feedback procedure to a |3)—> |2) quantum jump occurring at t = 0, 
after the field has converged to the target. The upper plot in b shows the 
variation of the average modulus of the injection amplitude |a(2)| . Initially zero, 
|| grows rapidly to a maximum while the quantum jump is reversed. The 
controller finally quiets and |z| returns to its average steady-state value. 


control Lyapunov function) between the field estimation p and the 
target p, = |m,)n,|. In the simulations described in ref. 12, the simple 
definition d = 1 — (n,|p|n,) was used. This distance vanishes when the 
target is reached, but it does not discriminate the n ¥ n, Fock states 
whose distances to the target are all equal to 1. A better choice defines 
the distance as d=1—Tr(A“™ p), where A™ isa diagonal matrix with 
(n,| A |n,) = 1 and the other elements (n| A |n) (n ¥ n,) decreasing 
monotonically with |n - n,|. In this case, d carries information not only 
about the probability that the field contains n, photons, but also about 
how far from n, non-negligible P() values are found. The A) matrix 
is optimized by performing simulations of feedback trajectories and 
adjusting the A\”” coefficients to obtain the fastest convergence. Based 
on this value of A“, K searches, at each iteration step, for the « value 
which minimizes d(p,, D(«)pD(—«)). To reduce the computation 
time, it uses an expansion of D(«) up to second order and determines, 
under this approximation, an optimal field translation with « in the 
[—0.1,+0.1] interval (see Supplementary Methods). 

Figure 2 shows the experimental records of two 164-ms-long feed- 
back sequences aiming at |m, = 2) (left column) and |n, = 3) (right 
column). The measurement outcomes (Fig. 2a) are fed into K, which 
updates the distance to the target (Fig. 2b) and computes the optimal 
field translation applied by S (Fig. 2c). This results in the estimated 
probabilities for finding n = n,, n <n, and n > nm, number of photons 
in C (Fig. 2d). After an initial transient period lasting about 20 ms (240 
iterations, about 50 detected atoms), the distance to the target drops to 
a small value and the field reaches |n,) with a fidelity (n,|p|n,) ~ 0.8. 
The actuator operates during the convergence phase and then quiets 
down until the field undergoes a quantum jump towards |n, — 1). The 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 75 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


distance to the target then features a sudden burst, inducing S to 
become active again, until the target state is restored, in a time of about 
10-20 ms (120-240 iterations). Later quantum jumps are corrected in 
the same way. The rate of quantum jumps increases with n, which 
explains why S is somewhat more active for n, = 3 than for n, = 2, with 
a slightly reduced average fidelity. Similar recordings obtained for 
n, = 1 and 4 are shown in Supplementary Methods. 

Figure 2e shows snapshots of the density operator p as estimated by 
the feedback controller K. For each sequence, we have represented 
from left to right the initial coherent state, the states after the conver- 
gence has been observed, shortly after a quantum jump has been 
detected, and finally during the recovery from the jump. Note that 
the initially large off-diagonal elements p,,,,(n # n') vanish when the 
field state reaches the target represented by a single peaked diagonal 
matrix. A quantum jump is detected as a fast increase of the |n, — 1) 
state probability at the expense of that of the |n,) state, without build- 
up of off-diagonal elements. The recovery from the jump is due to 
small coherent field injections which create transient p,,,, coherences 
between Fock states close to n = n,. Supplementary Movies 1 and 2 
show the complete evolution of the field density operator during the 
feedback loops displayed in Fig. 2. 

For each n;, value, we have recorded large sets of feedback trajectories 
with two different stopping conditions: 4,000 of them are stopped by 
the controller at 164 ms, as in Fig. 2 (fixed time stop), and about 3,900 
are stopped when P(n,) is found by K to be greater than 0.8 in 3 
successive iterations (fixed fidelity stop). For each n, and stopping 
condition, the final ensemble-averaged photon number distribution 
Penp(n) is reconstructed independently from the K estimation, using 
additional probe atoms sent immediately after the interruption of the 
feedback loop (see Supplementary Methods). The blue and red bars in 
Fig. 3 give the values of Ponp(n) obtained for the fixed time stop and 
the fixed fidelity stop, respectively, for n,= 1 to 4. For reference, the 
green histograms show the measured photon number distribution of 
the initial coherent state, well described by the Poisson statistics. The 
high values of the red bars peaking at n, show the actual fidelity of the 
state preparation. The blue bar histograms are somewhat broader than 
the red ones because, on average, the field resides for fractions of time 
in states with n ¥ n, due to the finite time it takes to correct a quantum 
jump. These ‘fixed time stop’ histograms are however narrower than 
the initial ones of the coherent field, with Ponp(n;) about 2 times larger 
than the corresponding value for the coherent state. 

We have also analysed the speed of convergence towards the target. 
Figure 4a shows the fraction of trajectories that have reached the 
fidelity threshold (0.8) for n, = 3 as a function of time. The conver- 
gence time (at which 63% of the trajectories have converged) is 50 ms. 
We compare this result with that of an optimized trial-and-error pro- 
jection method based on a QND measurement. The photon number of 
an initial Glauber state with amplitude ,/n; is measured by QND probe 
qubits sent for a fixed time t. The preparation is declared successful if 
the inferred probability for n, is >0.8. Otherwise, the field is reset to the 
initial state and the procedure repeated until the threshold is reached. 
Choosing t = 14 ms optimizes the convergence rate. The stepped line 
in Fig. 4a shows that the convergence time is now 250 ms, 5 times 
longer than that of the quantum feedback method. 

Finally, we have investigated the dynamics of recovery from a 
quantum jump out of |n,=3). We prepare the field in the 
|n, — 1 = 2) Fockstate, using a projective QND measurement. We then 
start a feedback loop with the initial estimated photon number distri- 
bution given by the red histogram in Fig. 3c. We thus simulate experi- 
mentally the situation in which the field has suddenly jumped into 
|n =n, — 1) while K still ‘believes’ that n = n,. Figure 4b presents the 
time evolution of the subsequent values of the photon number distri- 
bution P(n,t) estimated by K and averaged over 2,561 trajectories. 
Within about 3 ms (~7 detected atoms), K ‘realizes’ that the jump 
has occurred (shown by the rapid drop of P(n,t) and the fast rise of 
P(n,— 1,t)) and activates the control injection. The field comes back to 
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its steady state (with P(n,) = 0.43, this value being limited by sub- 
sequent random quantum jumps) within ~15 ms. 

We have implemented a real-time quantum feedback procedure that 
generates photon number states on demand and stabilizes them by 
reversing the effects of decoherence-induced quantum jumps. This 
experiment, which combines quantum measurements and deterministic 
corrections, shows obvious similarities with quantum error correction 
codes** demonstrated with photons”, ions”, spins”* or superconducting 
qubits”. The long cavity damping time of our cavity quantum electro- 
dynamics set-up is an asset, because it allows the controller to perform in 
real time complex estimation and optimization operations. We plan to 
perform a variant of this experiment in which the classical actuator 
source will be replaced by Rydberg atoms delivering single photons into 
the cavity. The same set-up could also be used to perform adaptive 
photon number measurements in which the successive qubit settings 
will be modified in real time, taking into account the results of previous 
detections”. We are also considering applying similar quantum feed- 
back strategies to the stabilization of even more exotic states, such as 
‘Schrédinger cat’ states of radiation”’. 


Received 2 June; accepted 19 July 2011. 


1. Wiseman, H. M. Quantum theory of continuous feedback. Phys. Rev. A 49, 
2133-2150 (1994). 

2. Doherty, A. C., Habib, S., Jacobs, K., Mabuchi, H. & Tan, S. M. Quantum feedback 
control and classical control theory. Phys. Rev. A 62, 012105 (2000). 

3. Wiseman, H. M. & Milburn, G. J. Quantum Measurement and Control (Cambridge 
Univ. Press, 2009). 

4. Aharonov, Y. & Vaidman, L. Properties of a quantum system during the time 
interval between two measurements. Phys. Rev. A 41, 11-20 (1990). 

5. Peres, A. & Wootters, W. K. Optimal detection of quantum information. Phys. Rev. 
Lett. 66, 1119-1122 (1991). 

6. Nelson, R. J., Weinstein, Y., Cory, D. & Lloyd, S. Experimental demonstration of fully 
coherent quantum feedback. Phys. Rev. Lett. 85, 3045-3048 (2000). 

7. Smith, W.P., Reiner, J. E., Orozco, L. A., Kuhr, S. & Wiseman, H. M. Capture and 
release of a conditional state of a cavity QED system by quantum feedback. Phys. 
Rev. Lett. 89, 133601 (2002). 

8. Cook, R.L., Martin, P.J.& Geremia, J.M. Optical coherent state discrimination using 
a closed-loop quantum measurement. Nature 446, 774-777 (2007). 

9. Gillett, G. G. et al. Experimental feedback control of quantum systems using weak 
measurements. Phys. Rev. Lett. 104, 080503 (2010). 

10. Bushey, P. et al. Feedback cooling of a single trapped ion. Phys. Rev. Lett. 96, 

043003 (2006). 

11. Kubanek, A. et a/. Photon-by-photon feedback control of a single-atom trajectory. 

Nature 462, 898-901 (2009). 

12. Dotsenko, |. et a/, Quantum feedback by discrete quantum nondemolition 

measurements: Towards on-demand generation of photon-number states. Phys. 

Rev. A 80, 013805 (2009). 

13. Geremia,J.M. Deterministic and nondestructively verifiable preparation of photon 

number states. Phys. Rev. Lett. 97,073601 (2006). 

14. Guerlin, C. et al. Progressive field-state collapse and quantum non-demolition 

photon counting. Nature 448, 889-893 (2007). 

15. Brune, M. et a/. Process tomography of field damping and measurement of Fock 

state lifetimes by quantum nondemolition photon counting in a cavity. Phys. Rev. 

Lett. 101, 240402 (2008). 

16. Wang, H. et al. Measurement of the decay of Fock states in a superconducting 

quantum circuit. Phys. Rev. Lett. 101, 240401 (2008). 

17. Maitre, X. et al Quantum memory with a single photon in a cavity. Phys. Rev. Lett. 

79, 769-772 (1997). 

18. Raimond, J.M., Brune, M.& Haroche, S. Manipulating quantum entanglement with 

atoms and photons in a cavity. Rev. Mod. Phys. 73, 565-582 (2001). 

19. Glauber, R. J. Coherent and incoherent states of the radiation field. Phys. Rev. 131, 
2766-2788 (1963). 

20. Varcoe, B. T. H., Brattke, S., Weidinger, M. & Walther, H. Preparing pure photon 
number states of the radiation field. Nature 403, 743-746 (2000). 

21. Hofheinz, M. etal. Generation of Fock states in a superconducting quantum circuit. 
Nature 454, 310-314 (2008). 

22. Hofheinz, M. et al. Synthesizing arbitrary quantum states in a superconducting 
resonator. Nature 459, 546-549 (2009). 

23. Haroche, S. & Raimond, J. M. Exploring the Quantum: Atoms, Cavities and Photons 
(Oxford Univ. Press, 2006). 

24. Khalil, H. K. Nonlinear Systems (Prentice Hall, 2001). 

25. Steane, A. M. Error correcting codes in quantum theory. Phys. Rev. Lett. 77, 
793-797 (1996). 

26. Lu, C.-Y. et al. Experimental quantum coding against qubit loss error. Proc. Natl 
Acad. Sci. USA 105, 11050-11054 (2008). 

27. Schindler, P. et al. Experimental repetitive quantum error correction. Science 332, 
1059-1061 (2011). 

28. Knill, E., Laflamme, R., Martinez, R. & Negrevergne, C. Benchmarking quantum 
computers: the five-qubit error correcting code. Phys. Rev. Lett 86, 5811-5814 
(2001). 


©2011 Macmillan Publishers Limited. All rights reserved 


29. DiCarlo, L. et al. Preparation and measurement of three-qubit entanglement in a 
superconducting circuit. Nature 467, 574-578 (2010). 

30. Vitali, D., Zippili, S., Tombesi, P. & Raimond, J. M. Decoherence control with fully 
quantum feedback schemes. J. Mod. Opt. 51, 799-809 (2004). 


Supplementary Information is linked to the online version of the paper at 
www.nature.com/nature. 


Acknowledgements This work was supported by the Agence Nationale de la 
Recherche (ANR) under the projects QUSCO-INCA, EPOQ2 and CQUID, and by the EU 
under the IP project AQUTE and ERC project DECLIC. 


LETTER 


Author Contributions C.S. and |.D. contributed equally to this work. Experimental work 
was carried out by CS., |.D., X.Z., B.P., T.R., S.G., M.B., J.-M.R. and S.H., with major 
contributions from C.S., |.D. and X.Z.; P.R., M.M. and H.A. contributed to the design and 
optimization of the feedback control. 


Author Information Reprints and permissions information is available at 
www.nature.com/reprints. The authors declare no competing financial interests. 
Readers are welcome to comment on the online version of this article at 
www.nature.com/nature. Correspondence and requests for materials should be 
addressed to S.H. (haroche@lkb.ens.fr). 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 77 


©2011 Macmillan Publishers Limited. All rights reserved 


| sid ial Be 


doi:10.1038/nature10415 


Increased forest ecosystem carbon and nitrogen 
storage from nitrogen rich bedrock 


Scott L. Morford', Benjamin Z. Houlton! & Randy A. Dahlgren’ 


Nitrogen (N) limits the productivity of many ecosystems world- 
wide, thereby restricting the ability of terrestrial ecosystems to off- 
set the effects of rising atmospheric CO emissions naturally’”. 
Understanding input pathways of bioavailable N is therefore para- 
mount for predicting carbon (C) storage on land, particularly in 
temperate and boreal forests**. Paradigms of nutrient cycling and 
limitation posit that new N enters terrestrial ecosystems solely from 
the atmosphere. Here we show that bedrock comprises a hitherto 
overlooked source of ecologically available N to forests. We report 
that the N content of soils and forest foliage on N-rich metasedi- 
mentary rocks (350-950 mg N kg” ') is elevated by more than 50% 
compared with similar temperate forest sites underlain by N-poor 
igneous parent material (30-70 mg N kg‘). Natural abundance N 
isotopes attribute this difference to rock-derived N: '°N/"*N values 
for rock, soils and plants are indistinguishable in sites underlain by 
N-rich lithology, in marked contrast to sites on N-poor substrates. 
Furthermore, forests associated with N-rich parent material contain 
on average 42% more carbon in above-ground tree biomass and 60% 
more carbon in the upper 30 cm of the soil than similar sites under- 
lain by N-poor rocks. Our results raise the possibility that bedrock N 
input may represent an important and overlooked component of 
ecosystem N and C cycling elsewhere. 

Globally, sedimentary rocks contain 107" g of fixed N, considerably 
more than the 10’ g of fixed N in the total biosphere®. Such N is 
primarily derived from the burial of organic matter in marine and 
freshwater sediments, where it is incorporated into rock as organic 
N or as ammonium in silicate minerals. Sedimentary and metasedi- 
mentary rocks are distributed globally and typically contain between 
200 and 1,200mgNkg ', whereas high-grade metamorphic and 
igneous rocks typically contain less than 40 mg N kg (ref. 6). Fixed 
N is also found as nitrate salts in arid environments’, owing to the 
deposition of atmospheric N over millennia. 

Reservoirs of silicate N were identified more than half a century 
ago®, but it is generally believed that rock N is not sufficiently import- 
ant to alter the terrestrial N cycle, despite evidence to the contrary”. Just 
as weathering of phosphorus (P) from bedrock is considered the 
dominant source of P to terrestrial ecosystems"°, geological N may also 
be a long-term source of bioavailable N to plants and ecosystems. 
Bedrock has already been implicated as a source of N to aquifers’? 
and surface waters'*. Here we show, using a range of chemical and 
isotopic techniques, that bedrock contributes substantial amounts of 
N to temperate coniferous forests. In addition, we conducted a 
regional-scale investigation of 88 forest inventory and analysis (FIA) 
plots to show that total forest carbon storage is higher in ecosystems 
underlain by N-rich bedrock than in those underlain by N-poor rocks. 

We tested the hypothesis that rock weathering is an ecologically 
significant N source that can increase forest productivity and C storage 
in temperate coniferous forests of northern California, USA. The first 
site, South Fork Mountain (SFM), is underlain by mica schist derived 
from low-grade metamorphism of marine sediments dating to the 
early Cretaceous period’’. The mica phase of the schist contains at 
least 2,700 mg N kg ' as interlayer ammonium that is released to soil 


solution as the rock weathers'*. Our second site, adjacent to SFM, is 
underlain by the Bear Wallow Diorite Complex (BWDC), a plutonic 
rock dating to the Jurassic period”. 

These sites have common tectonic histories, with significant uplift 
initiated during the Pleistocene’* and modern uplift rates estimated to 
be from 1 to4mm yr! (see Methods). Other than parent material, all 
other state factors are similar across sites (Table 1). Soils at both sites 
are classified as Dystroxerepts, have sandy loam to loam texture, and 
show soil development to a depth of 0.5-1.0 m. The strontium isotope 
(°’Sr/*°Sr) composition of foliage from SFM and BWDC suggests that 
the two sites receive similar fractions of Sr inputs from precipitation and 
bedrock (Supplementary Fig. 1). Although differences in mineralogy and 
Sr concentration preclude a direct comparison of bedrock weathering 
inputs between sites'®, the foliar Sr data are consistent with other soil 
metrics in suggesting that the weathering status of our sites is com- 
parable. In addition, stand composition and tree age at SFM (tree age 
121 + 7.5 years, mean + s.e.m.) and BWDC (111 + 7.6) are similar, and 
the '°C/"°C of foliage is largely indistinguishable between sites (Sup- 
plementary Table 1), indicating similar plant-available water regimes. 
Atmospheric deposition contributes less than 1kgNha ‘yr ‘ 
(ref. 17), whereas biological N>-fixation inputs are estimated to be 
about 5kgha 'yr~', on the basis of measurements from similar 
mature forests in western Oregon"®, empirical models’? and numerical 
simulations’. 

In contrast, the mean N content of bedrock differs substantially: 
682 + 50 and 55+6mgNkg ' (mean +s.e.m.) at SEM and BWDC, 
respectively (Fig. la). The bulk-N content of the SFM rock is similar to 
the global average for marine pelagic sediments, but is lower than for 
most organic-rich marine sediments, which commonly exceed 
1,000 mg Nkg ' (ref. 20). The differences in rock N pools translate 
to profoundly different N cycles: surface mineral soils in SFM forests 


Table 1 | State factors and soil characteristics for forests on SFM and 
BWDC 


Parameter SFM BWDC 

State factors 

Parent material Mica schist Diorite-gabbro 
Elevation (m) 1,650-1,720 1,400-1,500 
Aspect N-NE N-NE 
Precipitation (mm) 1,520 1,400 

Mean annual temperature (°C) 9 10 

Dominant soil type Dystroxerept Dystroxerept 
Mineral soil characteristics (0-30 cm) 

Soil texture Loam-sandy loam Sandy loam 
Soil pH (1:1 soil/water paste) 4.85 5.62 

Bulk density (gcm 3) 1.05 1.09 

Coarse fragments (%) 25 30 

Clay content (%) 8-12 5-8 

Total C (mass %) 5.512030 3.54 + 0.34 
C/N (mol/mol) 20.8 + 1.0 29.5+1,3 

Soil C storage (Mg ha?) 130.2 = 7.1 810+78 

Soil N storage (Mg ha?) 732035 3.2+0.26 

Soil physical and chemical characteristics in the top 30 cm of mineral soil are reported. C and N pools in 


soil are reported as means = s.e.m. (n = 34). 


1Department of Land, Air and Water Resources. University of California — Davis, California 95616, USA. 
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Figure 1 | Total nitrogen in rock, soil and foliage pools for SFM and BWDC 
forests. a—c, Total nitrogen in rock (mg N kg, n = 18) (a), soil (%N, n = 34) 
(b) and plant foliage (%N, n = 80) (c) in the BWDC (black) and SEM (grey) 
forests. d, Foliar nitrogen expressed in jig N per needle, to account for biomass 
dilution. Calocedrus decurrens is not presented in d because of scale-leaf rather 
than needle-leaf morphology. Error bars represent s.e.m. Species sampled: Ac, 
Abies concolor; Pl, Pinus lambertiana; Pp, Pinus ponderosa; Cd, Calocedrus 
decurrens. Asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P< 0.001. 


contain significantly more N than those in BWDC forests (3,026 + 149 
and 1,426 + 113mgNkg | (mean + s.e.m.), respectively; Fig. 1b). In 
addition, soil C/N ratios in the top 30 cm of mineral soil are significantly 
lower at SFM (20.8 mol:mol) than at BWDC (29.5). Elevated total C 
concentrations in SFM soil result in substantially more C storage in the 
top 30cm of soil at SFM (130 + 7MgCha |; mean + s.e.m.) than in 
BWDC (81 + 8MgCha '; Table 1). These differences are consistent 
with a California statewide soils database (n = 183; Supplementary 
Fig. 2); C and N storage at BWDC is similar to the state average for 
comparable ecosystems (74.9MgCha ', 3.89MgNha '), whereas 
SFM has among the highest C and N contents observed. 

Further, N enrichment in rocks and soils is readily apparent in the 
tree foliage. On average, conifer needles at SFM contain 50% more N per 
needle than at BWDC; this difference is observed in three of four 
sampled species (Fig. 1c); trees on SFM contain 7-30% more N on a 
foliar mass basis. Calocedrus decurrens was the most N-enriched: 
1.45+ 0.04% N (mean+s.e.m.) at SFM, versus 1.10+0.03%N at 
BWDC. The other trees showed similar but less substantial N enrich- 
ment (Abies concolor, 1.14+0.04%N and 1.00+0.04%N; Pinus 
ponderosa, 1.50+0.06%N and 1.33 + 0.06% N; Pinus lambertiana, 
1.33 + 0.04% N and 1.25 + 0.03% N at SFM and BWDC, respectively). 

Under N-rich conditions, plants can either concentrate N in foliage 
(for example, Calocedrus) or build more foliage (or do both; for 
example Abies and Pinus), the former leading to higher N concentra- 
tions in foliage, the latter to no major changes in N contents. At SFM, 
needle biomass in pines is roughly 70% higher than at BWDC 
(Supplementary Table 2), implying significant gains in biomass with 
added N. Examining N contents on a per-needle basis to account for 
nutrient dilution” reveals even more profound differences between 
sites: SFM trees show 35-90% more N than those at BWDC 
(Fig. 1d). It therefore seems that forest responses to geological N inputs 
are manifested as higher foliar biomass production, in addition to the 
higher foliar N concentrations observed. 

Next, we used natural-abundance !°N/"4N stable isotopes (SN 
parts per thousand (%bo) versus air) to trace the movement of N from 
rocks to soils and plants (Fig. 2). Nitrogen-isotope data for the BWDC 
site are consistent with expectations for other similar N-limited tem- 
perate conifer forests”*: foliar 85'°N (— 1.59 + 0.19%o; mean + s.e.m.) 
is depleted relative to soil (1.96 + 0.32%o), whereas 5'°N in rocks 
(18.37 + 1.55%bo) is substantially elevated compared with that in plants 
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Figure 2 | Nitrogen isotope values of the rock-soil-plant system. a, SFM 
forest; b, BWDC forest. The median and range are plotted for each component. 
The dashed line represents the approximate isotope value for total atmospheric 
N inputs. 


and soils. In contrast, foliar 5'°N at SEM (3.10 + 0.17%o) is 4.7%o 
higher than at BWDC and is essentially indistinguishable from that 
of soils (3.6 + 0.17%) and rock (3.3 + 0.22%o). This points to a direct 
link between weathering of rock N and the elevated N status of SFM, 
and conforms with general patterns of plant and soil 5'°N in N-rich 
ecosystems”, Moreover, the lack of differentiation between the 
N/"4N of rock inputs and soils at SEM indicates little or no isotopic 
expression through N losses at SFM. This agrees with observations of 
high-nitrate leachates in soils’* and streams’” associated with N-rich 
rock substrates, given that N leaching does not seem to impart a major 
fractionation of N isotopes in comparison with gaseous losses of N 
(ref. 24). Finally, the contrast of plant and soil 5'°N pools at SFM and 
BWDC is consistent with models that show increased isotopic expres- 
sion of ectomycorrhizal N transfer under low availability of N (ref. 25) 
(that is, at BWDC), although we do not observe differences in foliar 
5'°N between ectomycorrhizal and arbuscular mycorrhizal species at 
either site (Supplementary Fig. 3). 

Rising levels of atmospheric CO, and climate change have renewed 
interests in links between N and C cycles, especially in high-latitude 
forests*. Nitrogen limitation of extra-tropical forests is widespread”®; 
new N inputs therefore have the potential to allow for more CO, 
uptake and storage on land”’, thus affecting the pace and magnitude 
of global climate change. Using FIA data from forests developing on 
bedrock similar to that at SFM (eastern Franciscan, N-rich) and BWDC 
(western Klamath, N-poor) geological provinces of Northwest 
California, we show that geological N may also contribute to enhanced 
forest productivity and C storage on land. 

Holding other state factors constant, our analysis suggests that forests 
on bedrock with high N-enrichment potentials (metapelites formed 
under a low geothermal gradient), on and within the SFM region, 
contain 42% more C in above-ground tree biomass than similar forests 
in the region that are underlain by N-depleted bedrock (P = 0.009, 

* = 0.66; Fig. 3). Although lithology in our model represents the 
sum of the parent material influences on tree C stocks, the physical 
properties inherited from parent material, such as soil texture and 
depth, are taken into account by the available water-holding capacity 
parameter (P = 0.001). This implies that the enhanced C storage on 
putatively N-rich lithology is most associated with differences in nutri- 
ent release from rocks. Given the N-limited propensity of such forests, 
these region-wide results indicate higher productivity and C storage 
owing to bedrock N, which is consistent with the higher N and C stocks 
at SFM (Table 1 and Fig. 1) and with Forest Service data suggesting that 
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Figure 3 | Carbon in above-ground tree biomass for forests growing on 
N-rich and N-poor lithology. Plot of carbon (Mgha_‘) against stand age for 
FIA plots (n = 88) on N-rich (crosses) and N-poor (triangles) lithologies used in 
the model. To account for model axes not shown, we present detransformed 
model estimates for sites on N-rich (solid line) and N-poor (dashed line) 
lithologies, holding other model parameters constant at nominal values to 
illustrate differences in carbon storage attributable to lithology. The grey area 
indicates lithology s.e.m. Our analysis estimates that sites on N-rich lithology 
contain 42% more Cin above-ground tree biomass than sites on N-poor lithology 
(P = 0.009; adjusted R* = 0.66) after accounting for confounding state factors. 


SEM contains the most productive Douglas fir forests remaining in 
California’®. 

Our results raise the possibility that rock weathering may be a sig- 
nificant source of N to terrestrial ecosystems underlain by N-rich 
substrates elsewhere. There are many documented cases in which 
the N budgets of forests are out of balance, often being in need of 
substantially higher N input rates to account for N accumulation in 
soil and plants”. In cases where bedrock is enriched in N, parent- 
material N could explain some (or all) of the missing N inputs. For 
example, using the N isotope data to devise a set of mass balance 
models, we estimate that rock N sources contribute 30-100% of the 
ecosystem N inputs at SFM, depending on the degree of isotopic 
expression by N losses (Supplementary Table 3). These results are 
comparable to simple uplift models (rock N = 47% of N inputs) and 
weathering experiments in the laboratory (rock N = 64% of N inputs). 
In terms of fluxes, these various approaches point to substantial N 
inputs: 3.0-10.9kgha ‘yr ' by rock weathering at SFM, potentially 
more than doubling known inputs from the atmosphere (that is, fixa- 
tion plus deposition equal to 6 kgNha ! yr | (refs 3, 17-19)). Given 
that sedimentary and metasedimentary rocks contain 99% of the global 
fixed N and cover roughly 75% of the Earth’s land surface”, the poten- 
tial for bedrock N to stimulate productivity and C storage in the ter- 
restrial biosphere seems globally significant. 


METHODS SUMMARY 


One-year-old sun foliage from the mid-canopy of mature conifer trees was col- 
lected with a pole saw in autumn 2008, after full needle elongation. Foliage was 
rinsed with deionized water and dried at 50°C for 48h. Needle biomass was 
determined by a 100-needle count on dried foliage. Samples were ground and 
analysed for total C and N, 5'°N and 5'°C by the University of California - Davis 
Stable Isotope Facility with a SerCon Hydra 20/20 isotope ratio mass spectrometer 
(IRMS). Isotope values are reported in parts per thousand (%o) relative to Vienna 
PeeDee Belemnite for '°C/'7C, and air N, for '°N/"4N, using standard delta (5) 
notation. Surface mineral soils were collected to a depth of 30cm, dried in air, 
sieved to 2 mm, milled to pass a 200-|1m sieve, and then analysed by IRMS for total 
CandN, 8°N and 81°C. Bulk density was determined by the core method, and soil 
texture by laser diffraction. Minimally weathered bedrock samples were collected 
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from outcrops at each site, cut with a slab saw to remove weathered surfaces, 
treated with 5% hydrogen peroxide for 24h to remove any remnant surficial 
organic matter, and pulverized with a carbide-steel shatter box to pass a 75-um 
sieve. Samples were analysed by IRMS for total C and N, 8'°N and 8°C. FIA plot 
data, including intensification plots, were obtained from the United States Forest 
Service. Above-ground carbon stocks were calculated with the Jenkins equations 
in the ‘Fire and fuels extension’ of the Forest Vegetation Simulator. Student’s 
t-tests were used to compare differences between sites in total N in rocks, soil 
and foliage. The FIA model took the form of a multiple linear regression. Statistical 
tests and model fitting were performed in R. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Field sampling and laboratory analysis. Sampling areas were located near 
40° 18'N 123°18’ W and 40° 26’N 123°21’ W for the SFM and BWDC sites, 
respectively. Within each sampling area, three locations were chosen at random 
for collection of foliage, soils and rock. A pole saw was used to collect sun needles 
from 10 m above the forest floor from co-dominant conifer trees in the autumn of 
2008. Three clippings from each tree were bulked into a single sample. One-year- 
old needles were removed from the branches by hand, and needles showing 
damage, discoloration or herbivory were discarded. The separated needles were 
washed with deionized water and then dried for 48h at 50°C. A subsample of 
dried foliage was ground with a Wiley mill and then pulverized with a ball mill 
before chemical analysis. From the remaining sample, 100 needles were chosen at 
random and weighed to determine a 100-needle mass. 

Ground samples were placed into tin capsules and analysed for total C and N, 
®¢C/?C and °N/"4N ona Sercon 20/20 elemental analysis-IRMS at the University 
of California - Davis Stable Isotope Facility. Isotope values are reported in parts 
per thousand (%o) relative to Vienna PeeDee Belemnite for 8¢/?C and air N> for 
SN /"4N, using standard delta (5) notation*”. Foliage replicates had standard devia- 
tions for total C, total N, 5!°C and 5°N of less than 0.1%, 0.05%, 0.05%o and 0.2%o, 
respectively. 

We dug three 50 cm X 50 cm pits in each sampling location down to the C, horizon 
(60-80 cm) to describe soil taxonomy. Additional smaller pits (0.3 m X 0.3 m) were 
used to collect representative mineral soil samples for the top 30 cm of the profile for 
chemical analysis. Soil bulk density and volume of coarse fragments were quantified 
in the surface mineral horizons with the use of the core method”. Soil texture was 
determined by laser diffraction*’. Soils were air-dried in the laboratory and sieved 
to 2mm; fine roots were then removed with forceps. The soil fine fraction was 
ground with a ball mill, weighed into tin capsules and analysed for elemental and 
isotopic analysis (see foliar methods for details). Soil replicates had standard 
deviations for total C, total N and 8!°N of less than 0.02%, 0.005% and 0.3%o, 
respectively. 

Rock samples were recovered from minimally weathered outcrops with a crack 
hammer. The weathering rind was removed with a lapidary slab saw. The sample 
was then treated with 5% hydrogen peroxide solution for 24h to remove surficial 
organic matter, dried and then crushed with a hydraulic press to obtain particles 
with diameters of less than 10mm. Samples were then washed with deionized 
water and dried at 110 °C for 48 h. A 60 g subsample was pulverized with a carbide- 
steel shatter box to pass a standard US 200 mesh (74-1m) sieve. Samples were 
weighed into tin capsules and submitted to the Stable Isotope Facility for elemental 
and isotopic analysis. Replicates for SFM schist had standard deviations for total N 
and 8°N of 10mgkg ' and 0.3%o, respectively. Replicates for rock samples from 
the BWDC site had higher analytical uncertainty, with standard deviations of total 
N of ISmgkg ' anda 8’°N of 2%p. In addition, two independent methods were 
used to quantify total N in a subset of samples: vanadium oxide catalysed com- 
bustion* and HF/HCl digestion followed by conductimetric NH, quantification”. 
Both methods resulted in N recovery within 10% of the standard elemental ana- 
lysis-IRMS method. 

Comparison of sample means of total N in rock, soil, and foliage between SFM 
and BWDC sites was performed with Student’s t-test in R°**. Homogeneity of 
variance was tested with the Sharpio-Wilk test for normality. 

For Sr analysis, rock and foliage samples were prepared as described above; 0.25 g of 
foliage was digested with ultrapure nitric acid and hydrogen peroxide ina CEM MDS- 
2100 microwave digester. Snow was collected immediately after a storm in March 2009 
and passed through pre-rinsed 0.45-,1m Millipore Millix syringe filters. Sr extractions 
for all samples were performed with Eicrhom Sr-Spec resin and analysed on the Nu 
Plasma HR multi collector—inductively coupled plasma—mass spectrometer (MC- 
ICP-MS) at the University of California - Davis Interdisciplinary Center for Plasma 
Mass Spectrometry. Sr was corrected to *°Sr/**Sr = 0.1194 to account for instrumental 
mass fractionation, and then normalized to SRM 987 (°’Sr/*°Sr = 0.710248). The 
SRM 987 standard averaged *’Sr/*°Sr = 0.710269 + 0.000040 (2 s.d., n= 25). 
Method blanks were less than 0.4 and 1.3 ng for rock and foliage, respectively. 
Analysis of FIA plot data. A total of 130 FIA plots were identified within the 
designated study area and included both the standard and intensification plots for 
rare forest types*’ *”. Potential N-bearing lithologies were identified with available 
geological data'*“°*, field observations and geochemistry data from our rock 
collection. Potential N-rich sites occur on metasediments (blueschist facies) in 
the eastern Franciscan belt, whereas N-poor sites are located in terranes of western 
Klamath that have undergone substantial contact metamorphism. Of the 130 sites, 
89 were initially selected for inclusion in the analysis by using the following 
criteria: 

1. We selected for mixed pine, Douglas fir, Douglas fir-white fir, and Douglas 
fir-ponderosa pine forest types by using CalVeg classification data**. 


2. Plots occurring on ultramafic/serpentine soils were excluded by using spatial 
data from the Trinity serpentine soil survey’”. 

3. Plots were located on slopes less than 80% and had mean annual precipitation 
values from 1,000 to 2,000 mm. 

4. Plots had a mean stand age (defined below) of more than 20 years and less 
than 200 years. 

5. The Stand Visualization System (SVS)** was used to prescreen for stands that 
exhibited significant mortality in the largest diameter classes resulting from 
extreme disturbance events such as high-intensity wildfires or insect and microbial 
pathogens. Stands in which more than 75% of the canopy area was occupied by 
dead standing trees in combination with 50-80% mortality in the largest tree 
diameter classes were removed from the analysis. 

A total of 88 plots were included in the final analysis (Supplementary Table 4); 
one was removed after an analysis of disturbance regimes using fire history and 
stand density characteristics (discussed below). 

Seven parameters were used as potential explanatory variables for the analysis. 
Supplementary Table 5 provides information on the central tendency and data 
distribution for each parameter. Non-significant variables (P>0.05) were 
dropped from the final analysis during model fitting (see discussion below). The 
full set of independent parameters tested in the analysis includes: 

1. Stand age. The FIA intensification plot database does not include information 
on stand age; we therefore estimated stand age (A,) using a basal area-weighted 
function of measured conifer trees within each plot, calculated from 


A, = CA Ap/S> Ap (1) 


where A; is the age of an individual measured tree, and Ay is a calculated value 
representing the basal area per hectare. Age data from live, measured conifer trees 
only were only used in this determination. 

2. Slope. Percentage slope was taken directly from the FIA database and con- 
verted to decimal fractions. In the case ofa split plot, the mean slope across the four 
subplots was used. 

3. Available water-holding capacity (AWHC). An estimate of the total amount 
of water the soil profile can hold, based on soil texture, depth and coarse fragments 
(fraction larger than 2mm in diameter). AWHC values are spatially averaged 
values of SSURGO and STATSGO data, using a 1-km? grid”. 

4. Insolation. Annual solar input (W h m ”) was derived from the 1/3-arcsecond 
(10-m) National Elevation Data set (NED) using the Solar Analyst function*! in 
ArcGIS 9.3 (ref. 52). Insolation values were linearly rescaled from 0 to 1. 

5. Precipitation. Mean annual precipitation was derived from the PRISM 
30-arcsecond (800-m) data set (1971-2000)**. 

6. Temperature. Mean annual temperature was derived from the PRISM 
30-arcsecond (800-m) data set (1971-2000)**. 

7. Lithology. Categorical variable that indicates whether the plot is located on 
lithologies with nitrogen statuses similar to SFM (eastern Franciscan) or BWDC 
(western Klamath). 

The response variable in the model was above-ground carbon (C,) in standing 
live trees (MgCha_'). C, was calculated on a per-tree basis by using Jenkins 
allometric biomass equations” in the ‘Fire and fuels’ extension® of the Forest 
Vegetation Simulator (FVS)°”. 

To account for potential differences in disturbance regimes between regions, we 
used fire history’* in combination with stand density information to identify plots 
where disturbance had substantially influenced carbon storage. A total of 26 of the 
initial 89 plots had a record of fire since 1920 (29%), with 11 recorded incidents in 
the eastern Franciscan and 15 in the western Klamath. Only 14 of the 26 fires 
occurred within the past 50 years, with 40% occurring in the eastern Franciscan. 
Inclusion of a categorical parameter in the model to account for the presence or 
absence of fire was non-significant (P > 0.05) at the three timescales examined (15, 
30 and 50 years). 

We compared stand density with basal area across plots to identify sites where 
fire significantly altered stand structure. To compare sites directly, we calculated 
Reineke’s stand density index (SDI)° for each plot to account for the negative 
exponential relationship (reverse J-shaped distribution) between tree size and 
density®. After a log-log transformation to account for data heteroscedasticity, 
we found a strong linear relationship between log(SDI) and logC, across litholo- 
gies and fire histories (R? = 0.92; Supplementary Fig. 4). We identified and 
removed a single plot that deviated significantly from the population SDI:Ay trend. 
The removed plot, located in the western Klamath (N-poor lithology), was 
194 years old with only 59 MgCha * in above-ground tree biomass. 

We used R** to fit a multiple linear regression model, accounting for two-way 
interactions between local state factors. Initial model parameter selection was 
performed with the ‘step’ function in R—an implementation of Akaike’s informa- 
tion criterion®’. In favour of parsimony, the final model included only parameters 
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deemed significant (P= 0.05) by type II analysis of variance with the ‘Anova’ 
function from the Companion to Applied Regression (CAR)® library in R°*. 
The final model took the form 


logC, = Bo + log x, + x2 +.x3 + x4 + x5 + x6 +6 (2) 


where fi is the intercept, x;—x5 represent coefficients fitted by the model (x, stand 
age; X, lithology, x3, temperature, x4, AWHC; xs, insolation; x, AWHC insolation) 
and ¢ represents error. An ANOVA table for the final model is presented in 
Supplementary Table 6). Precipitation (P = 0.12) and slope (P = 0.49) were found 
to be non-significant in the final model. Collinearity was assessed by using the 
variance inflation factor from the CAR package (Supplementary Table 7). Model 
coefficient tables (Supplementary Table 7), model fit information (Supplementary 
Fig. 5) and residual plots (Supplementary Fig. 6) are also presented. 

Mass balance modelling methods for determination of bedrock N flux. We 
developed four simple models derived from natural-abundance N isotope data, 
tectonic uplift rates and laboratory weathering experiments to better constrain N 
input fluxes from bedrock into our sites. Our first two models apply principles of 
isotopic mass balance: model A estimates rock N contribution to the ecosystem 
under the assumption of no isotopic fractionating loss, whereas model B assumes 
that fractionating loss pathways are similar among sites. Model C uses steady-state 
assumptions to estimate N inputs using rock N content, tectonic uplift rates, and 
chemical weathering potential. Finally, modelD uses laboratory weathering 
experiments, corrected for temperature with Arrhenius relationships. All models 
were implemented in R and results are based on 100,000 Monte Carlo simulations. 
We report 2.5th centiles, medians and 97.5th centiles of all model outputs, repre- 
senting a 95% confidence interval around our estimates. 

In model A we employ a simple mixing model to estimate the fraction of rock N 
inputs to the ecosystems (frock, equation (3)), using measured isotopic data and 
estimated pool sizes. The major assumption in this model is that there is no 
fractionating loss at either the BWDC or SEM site. We estimate an ecosystem 
n/N (5 Necosysten) pool through the mixing of soil (f,,1) and biomass (fpiant) 
pools. The relative size of the soil versus biomass pools is difficult to quantify for 
these sites; however, we can use estimates of forest ecosystem carbon pools® and 
the C/N of plant biomass pools***® to predict a range of possible relative pool 
sizes. Using these constraints, we calculate that 0.07-0.40 of the ecosystem N 
reservoir is contained within plant biomass, but larger values most probably 
represent forests where soil C and N storage is low. In our model we use a range 
of 0.07-0.20 for the biomass contribution to the total ecosystem N pool, and 
assume that the '°N/™N of foliage is representative of the entire biomass pool 
(8Notants equation (4)). The '°N/"4N of the atmospheric endmember (5'°Natm) 
is assumed to range from —1.5%o0 to —0.5%bo, and 1N/"!N of the rock (8!°Nyock) 
and soil (8! °Ngoil) components in the model represent the mean + s.e.m. for each 
pool. 


froae= OM ecopeen = 8 Nepal Migae = 8 Naga) (3) 


) : "Necaetein = fold" °Nooit) a Splants( 5 : °Nptants) (4) 


Model B differs from model A in that it incorporates a fractionating loss term 
whereby isotopically light N is removed preferentially from the ecosystem (equation 
(5)). Loss pathways include both leaching (fteaching) and denitrification (fdenit); 
however, evidence indicates that the isotope effect of leaching is approximately 
zero when integrated through space and time (equation (6)), leaving denitrifica- 
tion as the sole fractionating loss pathway in the ecosystem. To consider isotopic 
effects of N losses, we make two simplifying assumptions. First, we assume that 
the contribution of N from rock is effectively zero at the BWDC forest. Second, we 
assume that the relative imprint of denitrification on the total eco- 
system pool is the same between sites, given the similarity of climate, vegetation 
and soil physical properties between SFM and BWDC. In doing so, we combine 
the fraction and isotope term in the denitrification pathway into a single variable, 
n (equation (7)). Using the assumptions from equations (6) and (7), we can 
solve for 7 (equation (8)) and the fraction of rock inputs at the SFM site (equation 
(9): 


8’ "Necosystem =fi(d' Nain) + frock(S'’Nrock) — faenit(Edenit) — fieaching(eachingX5) 


leaching — 0 (6) 
UT] = faenit(Zdenit) (7) 
1 = 8Natm — 8° Ngwoc (8) 
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For model C we estimated the rock N contribution to SFM and BWDC ecosystems 
from regional uplift rates and atmospheric N inputs. We assumed that uplift equals 
denudation rates at our sites and that the chemical weathering fraction is between 
0.05 and 0.2 of the total denudation rates, on the basis of estimates from the Sierra 
Nevada®®. Total atmospheric N inputs are between 4 and 8kgha ‘yr ' at both 
sites, on the basis of regional estimates of N fixation®!”* and N deposition’®. 
Quaternary uplift rates in the northwestern California are estimated to be between 
0.001 and 0.004 m yr ', on the basis of geodynamic models” and field data®”’””. 
The fraction of rock N input to the ecosystem (f,ock) can then be estimated from 
equation (10): 


Frock — Tu few (10,000 m ha_ pa Nrockl1/(1 + iatmos|) (10) 
where r, is uplift rate (myr'), f.y is the chemical weathering fraction, pg is rock 
density (kg m °), Nyock is the N content of bedrock, and ijtmos is the total atmo- 
spheric N input (kgha yr~). 

In model D we estimate rock N inputs at SFM on the basis of laboratory 
weathering data’. Bulk leaching experiments estimated bedrock N fluxes to be 
between 5 and 38kg ‘ha ‘yr ' after 360-day incubations at 20°C. Mineral 
dissolution rates are sensitive to temperature*”””’; we therefore estimate field N 
fluxes on the basis of laboratory weathering rates corrected for differences in 
temperature between the laboratory (20°C) and field (9°C), using Arrhenius 
relationships. Activity coefficients for potassium are used in place of ammonium, 
given the similarity in charge and ionic radius of ammonium and large-ion litho- 
philes in geochemical systems”. The field weathering rate (rgea) is estimated from 


equation (11): 
ae (a ta)| 
Tfield = €X) Nab 
rR Thield Thad 


where E, is the dissolution activity energy, R is the gas constant, rap is the labor- 
atory weathering rate, and Treg and Tia» are field and laboratory temperatures, 
respectively. 

Methods for statewide soil C and N analysis. A total of 183 pedons from the 
University of California - Davis Soil library were selected for the analysis, on the 
basis of vegetation data (mixed conifer and Douglas fir). Soils with andic prop- 
erties were identified and excluded from the analysis by using the database soil 
taxonomy and through comparison with SSURGO” soil survey data at the pedon 
location. The aqp”’ package in R** was used to calculate the depth-integrated C and 
N content for each pedon at intervals of 1 cm. Calculation of carbon and nitrogen 
soil storage were based on the horizon-integrated N and C content, the volume of 
coarse fragments, and the soil bulk density. The bulk density data from the data- 
base were incorrect or missing for several pedons; to correct for this, we assumed 
that bulk density increased linearly from 1,000kgm~* at the soil surface to 
1,200kgm~* at 30cm. 
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An earlier origin for the Acheulian 


Christopher J. Lepre’”, Héléne Roche’, Dennis V. Kent, Sonia Harmand*, Rhonda L. Quinn?*, Jean-Philippe Brugal®, 


Pierre-Jean Texier®, Arnaud Lenoble® & Craig S. Feibel* 


The Acheulian is one of the first defined prehistoric techno- 
complexes and is characterized by shaped bifacial stone tools’ ’. 
It probably originated in Africa, spreading to Europe and Asia 
perhaps as early as ~1 million years (Myr) ago**. The origin of 
the Acheulian is thought to have closely coincided with major 
changes in human brain evolution, allowing for further technolo- 
gical developments’*. Nonetheless, the emergence of the Acheulian 
remains unclear because well-dated sites older than 1.4 Myr ago are 
scarce. Here we report on the lithic assemblage and geological con- 
text for the Kokiselei 4 archaeological site from the Nachukui forma- 
tion (West Turkana, Kenya) that bears characteristic early Acheulian 
tools and pushes the first appearance datum for this stone-age tech- 
nology back to 1.76 Myr ago. Moreover, co-occurrence of Oldowan 
and Acheulian artefacts at the Kokiselei site complex indicates that 
the two technologies are not mutually exclusive time-successive com- 
ponents of an evolving cultural lineage, and suggests that the 
Acheulian was either imported from another location yet to be iden- 
tified or originated from Oldowan hominins at this vicinity. In 
either case, the Acheulian did not accompany the first human dis- 
persal from Africa”’° despite being available at the time. This may 
indicate that multiple groups of hominins distinguished by separate 
stone-tool-making behaviours and dispersal strategies coexisted in 
Africa at 1.76 Myr ago. 

Sediments of the Nachukui formation exposed along the northwest 
shoreline of modern Lake Turkana in Kenya" (Fig. 1 and Supplemen- 
tary Fig. 1) preserve several rich archaeological site complexes, among 
which is Kokiselei. Thus far, this complex is defined by ten sites, eight 
of which are found within a discrete 5-m interval of the middle part of 
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Figure 1| Geological’! and location map. Note the latitude and longitude 
coordinates provided for each place of investigation. Section 7000 (SEC. 7000) 
was surveyed ~200 m along the ephemeral stream; hence the indications of the 
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the nearly 170-m-thick Kaitio member. Six of those sites have been 
tested or largely excavated but many other potential sites have been 
pinpointed. Most of these Kokiselei sites contain typical core/flake 
Oldowan assemblages’’. Kokiselei 4 (KS4), however, holds an excep- 
tional lithic assemblage that confirms the co-occurrence of the 
Oldowan and Acheulian at this site complex, indicating that the beha- 
vioural repertoire of early hominins in the area incorporated both 
technologies. 

The KS4 assemblage (Supplementary Fig. 2) is characterized by the 
presence of pick-like tools with a trihedral or quadrangular section, 
unifacially or bifacially shaped crude hand-axes, and a few cores and 
flakes, all derived from the same mudstone bed. A single subsurface, in 
situ origin for KS4 is ensured by excavations at the main test trench that 
recovered several spectacular sets of refitted lithic artefacts (Sup- 
plementary Fig. 3). To the exception of a few cores made on basalt, 
the rest of the assemblage has been knapped from large cobbles or 
tabular clasts of locally available aphiric phonolite’*. No vertebrate 
remains have been found within the test trenches, but the mudstone 
bed has yielded numerous non-human vertebrate fossils. The most 
frequent taxon is the large-sized hippopotamus Hippopotamus gorgops 
(adult and juvenile), but suids (Kolpochoerus, Metridiochoerus, 
Notochoerus?), rhinoceros (Ceratotherium sp.), equids (Equus and 
Hipparion) and a few carnivores (Panthera, Hyaenidae aff. Crocuta) 
are also present. Some isolated teeth and post-cranial elements represent 
bovids, and it is possible to identify Bovini (aff. Syncerus), Reduncini 
(Kobus sp.) and medium-sized Tragelaphini and Alcelaphini. 

Broadly speaking, the Acheulian tools of KS4 come froma 15-20-m- 
thick interbedded series of gravels, sands and mudstones colloquially 
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base and top are given. The other two—SEC. 7100 and SEC. 7400 (the latter of 
which is where the Acheulian site KS4 was excavated)—derive from sheer 
outcrops, and the coordinates for only the top of the sections are given. 
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referred to as the bird cliff beach complex (BCBC). Outcrops of the 
BCBC are part of a nearly continuous band of sediments belonging to 
the Kalochoro and Kaitio members that extends for over 7 km from 
northeast to southwest along the modern northwest shoreline of Lake 
Turkana, Kenya (Fig. 1). These members of the Nachukui formation 
record a transition from predominantly fluvial to lacustrine sedi- 
mentation in the Pleistocene Turkana basin during which the rest 
of East Africa is thought to have undergone ecosystem turnover in 
response to global climate forcing’*. A first indication of lacustrine 
deposits occurs ~35m above the base of the Kalochoro member, 
marked by the appearance of silty/clayey, often thinly bedded and 
finely laminated, ostracod-rich lacustrine mudstones, which coarsen 
upwardly into poorly sorted massive mudstones (Supplementary Figs 
8-11). Thin lenticular units of gravel and sand become apparent near 
the top of the Kalochoro member, which heralds the appearance of the 
BCBC in the Kaitio member. Lithological units of the BCBC form 
metre-thick, coarsening-upward cycles of claystone, siltstone and sand 
and/or gravel. At KS4, the coarsest layers of the BCBC consist of 
gravelly sands that preserve abundant rhizoconcretions at their tops 
(Supplementary Figs 10 and 11). Claystones of the cycles contain thin 
lenses of mollusc shells and are dissected by slickensided fractures that 
define wedge-shaped aggregates of palaeosols. These claystones, as well 
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as the siltstone, can be very poorly sorted in places and include volcanic 
granules/pebbles. Such lithostratigraphic and sedimentary patterns 
suggest a dynamic environment along a palaeo-lakeshore. 

In the Kokiselei region the erosive base of the BCBC occurs varyingly 
at 1-15m above the Kaitio member’s lowermost stratigraphic level, 
which is the KBS tuff'' dated by *’Ar/**Ar to 1.869 + 0.021 Myr ago”’. 
The KBS tuff lies 78 m above the Kalochoro tuff, dated by Ar??? Arto 
2.331 + 0.015 Myr ago’®, which defines the base of the Kalochoro 
member’. No direct numerical age constraints have been determined 
for the BCBC; however, regional geological mapping and tephrostrati- 
graphic studies indicate that it is ~150 m below the base of the Lower 
Koobi Fora tuff! dated by *°Ar/?’Ar to 1.476 + 0.013 Myr ago". 
Linear extrapolation from the Kalochoro and KBS tuffs and linear 
interpolation from the KBS and Lower Koobi Fora tuffs broadly con- 
strain the KS4 Acheulian assemblage to between 1.72 and 1.81 Myr 
ago. Other dated tuffaceous beds between the KBS and Lower Koobi 
Fora tuffs in the Turkana basin that would further refine stratigraphic 
position have not been found in the Kokiselei region. To place the KS4 
artefacts within a more constrained age context, we collected 148 
orientated samples for palaeomagnetic analysis (see Supplementary 
Methods and Supplementary Figs 4-7) from sections of outcrop 
exposed at this archaeological site and adjacent locations (Fig. 1). 
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Figure 2 | Summary diagram. Left: members and tephrochronology 
lithostratigraphy (brown, tuff/bentonite; buff, mudstone; small circles, sand; 
large circles, gravel) and virtual geomagnetic pole (VGP) latitudes (primary 
group (black symbols, n = 129) consists of reliable characteristic remanent 
magnetization directions; secondary group (red symbols, n = 19), unreliable) 


of the examined interval of the Nachukui formation. Right: the Early Stone Age 
in eastern Africa (refs 2, 3, 8, 12, 19, 25, 26 and this study) referenced to the 
geological ages of the oldest known out-of-Africa sites with hominin fossils”'® 
and the reversal chronology and age scale of the geomagnetic polarity timescale 
(GPTS)'”™. 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 83 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


Our palaeomagnetic results allow us to recognize three main polarity 
intervals for the overall composite magnetostratigraphy from the 
examined interval for the Nachukui formation (Fig. 2). A lowest interval 
of almost entirely reverse polarity extends from 7 m to 63.5 m above the 
base of the Kalochoro member/tuff. The following interval of mostly 
normal polarity begins at 63.5m above the base of the Kalochoro 
member/tuff and extends to approximately 100m above the base. 
Overlying this long mostly normal magnetozone is a 5-m-thick interval 
of exclusively reverse polarity. *°Ar/*’Ar dating of the Kalochoro tuff 
(~2.33 Myr ago) and the KBS tuff (~1.87 Myr ago) permits us to 
correlate accurately our magnetostratigraphy to the geomagnetic 
polarity timescale (GPTS) (Fig. 2). The polarity reversal at 63.5 m that 
is bracketed by these tuffs most probably represents the transition from 
the reverse Matuyama chron to the ensuing normal Olduvai subchron. 
Accordingly, the polarity reversal at ~100m correlates with the sub- 
sequent transition from the normal Olduvai subchron to the ensuing 
part of the reverse Matuyama chron. This implies that the long, mostly 
normal magnetozone from 63.5 m to 100 m is the entire Olduvai sub- 
chron. Thus, the Olduvai in the Nachukui formation is evidently 36.5 m 
thick. Additional indications that we have located the extent of the 
Olduvai subchron in the Nachukui formation comes from the excellent 
agreement between the sedimentation rates for the overlapping intervals 
of the Kalochoro to KBS tuffs (~17 cm kyr — ) and the identified base to 
top of the Olduvai subchron (~22 cmkyr'). If our outcrop sampling 
strategy was compromised by poor correlations, not enough vertical 
stratigraphic coverage, or unconformities, for example, then these two 
sets of independently derived sedimentation rates would be much more 
divergent, which is not the case—in fact, the experimental error asso- 
ciated with the radio-isotopic dates of the tuffs makes the two sets of 
sedimentation rates empirically indistinguishable. 

At 28.5 and 32m beneath the base of the Olduvai, our findings 
indicate the presence of two stratigraphic levels—each represented 
by one specimen—with positive inclinations and northerly virtual 
geomagnetic pole (VGP) latitudes that may correlate with the 
Reunion subchron. The positive inclination associated with the lower 
of the two levels might be the result of incomplete removal of a high- 
temperature magnetic component of specimen P022a carrier by hema- 
tite; however, the normal polarity associated with the upper level 
derives from well-resolved data of specimen 715-3a, which is more 
likely to represent the Reunion subchron (2.128-2.148 Myr ago’’). 
Other work on the Turkana basin sequences and correlative deposits 
in southwest Ethiopia has documented the Reunion subchron occur- 
ring as two normal polarity intervals’. 

Our results reveal a complex, fine-scale pattern characterized by 
several short polarity excursions near the Olduvai to Matuyama 
boundary that is not unique to this particularly locality. It has been 
reported in this interval at other Turkana basin sequences”, in oceanic 
cores”*”' and at the former Plio-Pleistocene boundary and point stra- 
totype section at Vrica from the eastern Mediterranean sapropel 
sequences'”**”*, The origins of the fine-scale structure characteristic 
of the top of the Olduvai subchron found at Turkana (ref. 19 and this 
study) and elsewhere'”””™ remain unclear, but might relate to the 
drop in the intensity of the geomagnetic field associated with the 
polarity reversal. The weak magnetic remanence of the sediment 
deposited in this low-intensity field may be more prone to resetting 
or overprinting in younger and comparatively stronger fields. Recent 
work suggests that such polarity excursions observed at Vrica could 
also reflect oxidation effects on the iron-bearing grains in the sapropel 
sequences”. Considering that post-Nachukui formation alluvial sedi- 
ments mantle places of the landscape near KS4, some younger deposi- 
tional or perhaps soil processes may have caused magneto-chemical 
alterations that might contribute to some of the complexity. 
Nevertheless, the similar palaeomagnetic reversal character for the 
end Olduvai at sites in different global geographic settings offers an 
exceptional correlation tool and provides additional support for our 
magnetostratigraphic interpretations. The stratigraphic position of 
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KS4 is 4.5 m above the Olduvai to Matuyama boundary. An age model 
based on a cubic spline fit to six chronostratigraphic tie-points 
(Kalochoro tuff, Reunion subchron, base of the Olduvai subchron, 
KBS tuff, top of the Olduvai subchron, and Lower Koobi Fora tuff) 
provides an estimate of 1.76 Myr ago for the KS4 Acheulian assem- 
blage (Fig. 3, see also Methods). This is more than 350 kyr older than 
the early Acheulian artefacts from Konso, Ethiopia’. 

An origin for the Acheulian back to 1.76 Myr ago is close in age to 
partial cranium KNM-ER 3733”, which is ostensibly the most defin- 
itive evidence for the antiquity of African Homo erectus sensu lato, 
considering it is arguably a more anatomically diagnostic specimen, 
and thus better understood taxonomically as compared to possible 
conspecific fossils older than 1.7 Myr’*. Several hypotheses link the 
development of the Acheulian with the initial evolution of H. erectus. 
This is mainly because of a similar geographic origin for the two*, and 
the large-brained species persisted when many, if not all other, Homo 
taxa went extinct*® during the evolution of the characteristically made 
Acheulian post-1.5-Myr ago. An earlier Acheulian origin, coeval with 
sympatric Homo species”, strengthens the possibility that more than 
one tool-making hominin existed at 2.0-1.5 Myr ago. 

Homo erectus is traditionally thought to be the first hominin to 
disperse from Africa, yet the oldest known out-of-Africa fossil homi- 
nin sites lack stone tools or preserve only Oldowan-style artefacts”. If 
indeed the first out-of-Africa hominin possessed Acheulian techno- 
logy, then it is expected that evidence of this techno-culture should also 
be found dispersed throughout the Old World. However, archaeolo- 
gical sites older than ~1 Myr preserving the Acheulian are not abun- 
dantly documented from the Middle East, Europe or Asia, and are 
younger than the oldest known out-of-Africa hominin fossil localities 
dated at 1.7 Myr ago* °°". Our data indicate that the earliest develop- 
ment of the Acheulian occurred in Africa at 1.76 Myr ago and was 
contemporaneous with or perhaps pre-dated the earliest hominin dis- 
persals into Eurasia. Yet, the difference between the ages for the oldest 
known Acheulian artefacts in the world from Africa and the oldest 
known Acheulian artefacts from Eurasia raises the likelihood that the 
first Eurasian hominins derived from an African population lacking 
Acheulian culture. Potentially, two hominin groups coexisted in Africa 
at 1.76 Myr ago. One of these groups could have developed the 
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Figure 3 | Age model. Cubic spline curve fitted to stratigraphic levels above 
the Kalochoro tuff versus age of magnetostratigraphic subchrons’””* (open 
circles) and dated tuffs'*’° (filled circles); dashed line shows how the 
stratigraphic level of KS4 was used to derive its age (1.76 Myr ago) from the 
spline curve. 
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Acheulian technology but remained in Africa. The other could have 
lacked the cognitive ability and/or technological knowledge to manu- 
facture the Acheulian technology and did not carry it into Eurasia. This 
division may indicate different behavioural aptitudes for separate 
African species (for example, H. erectus sensu lato versus Homo habilis 
sensu lato) or a within-species cultural disparity. In any event, it seems 
that a second hominin dispersal with Acheulian technology or a dif- 
fusion of this technology took place later, leading to the widespread 
occurrence of this Early Stone Age tradition in the circum- 
Mediterranean area and elsewhere after ~1 Myr ago*”. 


METHODS SUMMARY 


Orientations by magnetic compass and clinometer of planar faces were marked 
before removing hand-cut blocks from outcrops. Samples were taken at one-metre 
intervals or as the occurrence of fine-grained strata permitted. At least one inde- 
pendent block sample was taken from each interval resulting in 148 independent 
samples, from which one or more specimens were cut for processing. Magnetic 
remanence measurements were made with a 2G Model 760 DC-SQUID rock 
magnetometer in the shielded room of the Paleomagnetics Laboratory at 
Lamont-Doherty Earth Observatory. Natural remanent magnetizations of all sample 
specimens were subjected to progressive thermal demagnetization using an initial 
step of 100°C, seven steps at 50°C increments to 450 °C, and five steps at 25°C 
increments to 575°C. Magnetic susceptibility values were determined with a 
Bartington MS2B instrument initially and after each heating step to monitor for 
magneto-chemical alteration. Virtual geomagnetic pole (VGP) latitudes were calcu- 
lated from the characteristic remanent magnetization (ChRM) directions deter- 
mined from principal component analysis’ and Zijderveld demagnetization 
diagrams* (Supplementary Table 1). Reliable ChRM directions are characterized 
by maximum angular deviation (MAD) values of less than 15° of a component 
that linearly converges towards the origin over five high-temperature steps. VGP 
latitude for each specimen was plotted in stratigraphic position to determine 
magnetostratigraphy. 

The numerical age of KS4 was estimated through a model of the rate of sediment 
accumulation for the deposits encasing the site using a cubic spline function 
through data for the Kalochoro tuff (0m, 2.331 Myr ago'*), Reunion subchron 
(35 m, midpoint 2.138 Myr ago'””), base of Olduvai subchron (63.5 m, 1.945 Myr 
ago’’**), KBS tuff (78m, 1.869 Myr ago’), top of Olduvai subchron (100m, 
1.778 Myr ago'’”*), and Lower Koobi Fora tuff (247 m, 1.476 Myr ago’’). This 
method places the KS4 Acheulian assemblage (104.5 m) at 1.76 Myr ago. 
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Transient dynamics of an altered large marine 


ecosystem 


Kenneth T. Frank", Brian Petrie’, Jonathan A. D. Fisher” & William C. Leggett? 


Overfishing of large-bodied benthic fishes and their subsequent 
population collapses on the Scotian Shelf of Canada’s east coast’? 
and elsewhere** resulted in restructuring of entire food webs now 
dominated by planktivorous, forage fish species and macroinver- 
tebrates. Despite the imposition of strict management measures in 
force since the early 1990s, the Scotian Shelf ecosystem has not 
reverted back to its former structure. Here we provide evidence 
of the transient nature of this ecosystem and its current return 
path towards benthic fish species domination. The prolonged 
duration of the altered food web, and its current recovery, was 
and is being governed by the oscillatory, runaway consumption 
dynamics of the forage fish complex. These erupting forage species, 
which reached biomass levels 900% greater than those prevalent 
during the pre-collapse years of large benthic predators, are now in 
decline, having outstripped their zooplankton food supply. This 
dampening, and the associated reduction in the intensity of pre- 
dation, was accompanied by lagged increases in species abundances 
at both lower and higher trophic levels, first witnessed in zooplank- 
ton and then in large-bodied predators, all consistent with a return 
towards the earlier ecosystem structure. We conclude that the 
reversibility of perturbed ecosystems can occur and that this bodes 
well for other collapsed fisheries. 

The recent demonstration that overfishing of large-bodied preda- 
tors in the northwest Atlantic initiated a trophic cascade, typified by 
reciprocal changes in biomass between adjacent trophic levels extend- 
ing to the base of the food web'”, overturned the long-held view that 
large marine ecosystems are resistant to restructuring’. It has been 
proposed®’ that such trophic cascades are characteristic of ecosystems 
that have been transformed into undesirable states involving large 
changes in ecological functions and/or economic resources*’. 
Although the excessive consumption characteristic of trophic cascades 
may be unstable®, whether, how, and on what time scales such altered, 
diverse food webs and their key species and functional groups will 
recover remains unknown'**. This has led to controversy regarding 
the efficacy of and experimentation with strategies based on conven- 
tional management approaches such as moratoria on exploitation, 
culling and re-stocking intended to return ecosystems to their former 
structure’’’. Using four decades of high quality, annual, fishery- 
independent data (see Methods) representative of multiple trophic 
levels on the eastern Scotian Shelf (Supplementary Fig. 1), we docu- 
ment the transient nature of its altered ecosystem and its return 
towards dominance by large-bodied predators. 

The collapse of the northwest Atlantic cod (Gadus morhua) and 
several other large predatory fishes in the early 1990s (Fig. 1a), caused 
principally by over-fishing’*”’, precipitated the first documented open 
ocean trophic cascade in a large marine ecosystem’. The total biomass 
of cod, one of the ecosystem’s dominant species, has hovered at less 
than 5% of pre-collapse levels for almost two decades despite the 
implementation of strict regulations forbidding their capture’’. 

Recent investigations'*'® have provided strong evidence that, fol- 
lowing these collapses, the eastern Scotian Shelf, and other northwest 
Atlantic ecosystems in which similar collapses occurred, moved to 


apparent alternate states in which planktivorous forage fishes and 
macroinvertebrates became the dominant predators’. Released from 
predation on the eastern Scotian Shelf, the biomass of forage fishes 
increased by 900% (Fig. 1b) and macroinvertebrates by 200% com- 
pared to the pre-collapse years’*. They then competed directly with 
and/or preyed upon the early life stages of their once benthic predators, 
a phenomenon termed predator-prey reversal’’ which seems to be one 
of the leading causes of the delayed recovery of the benthic fish com- 
plex in this and other large marine ecosystems*’”"*. Although forage 
fish constitute approximately half the diet of an expanding, resident 
grey seal (Halichoerus grypus) population, estimates of their consump- 
tion of pelagic fish species (1995-2000) were only 35% of the benthic 
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Figure 1 | Variability of the eastern Scotian Shelf ecosystem. a-e, Data 
(Supplementary Fig. 1) based on large-bodied benthic fish (a), their prey 
(forage fishes, b) with estimated carrying capacity (dashed line), benthic fish 
exploitation history expressed as annual per cent biomass removal 

(c), changing ecosystem structure based on the leading mode (PCA1) of biotic 
data spanning four trophic levels and the demarcation of regimes” of 22, 14 and 
4 years duration (pink solid line) (d), and temperatures with averages shown for 
the three regimes (dashed vertical lines) (e). Vertical bars in a and b show 
+s.e.m. (n = 27). 
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fish complex" and insufficient to suppress the outbreaks and biomass 
variability of forage fishes (Fig. 1b). 

The changing status of the eastern Scotian Shelf ecosystem exhibited 
two transitions. A period of intensive fishing, when aggregate landings 
of benthic fishes averaged close to 105,000 tonnes (t) representing 
annual biomass removals of >50% (Fig. 1c), resulted in the first transi- 
tion centred in 1991-1992 from a ‘pre-collapse’ state dominated by 
benthic fish species to a ‘collapsed’ state dominated by forage fish 
species’. A cod and haddock fishing moratorium implemented in 
1993 had the desired effect of reducing aggregate exploitation (<5% 
since 2000; Fig. 1c) but did not produce the anticipated recovery 
(Supplementary Figs 2 and 3). The second transition, centred in 
2005-2006, represents a return towards a ‘recovering’ state of benthic 
fish domination described here (Fig. 1d and Methods). An additional 
bottom trawl survey, beginning in 1986, revealed a similar pattern of 
collapse and recent increase in benthic fish biomass (Supplementary 
Fig. 4). 

The physical environment during the three states, assessed using 
annual bottom temperatures, 0-50 m water temperatures (Fig. le) and 
water column stratification (Supplementary Information), showed 
only minor changes. The bottom temperatures during the collapsed 
state were only 0.33 °C and 0.24 °C lower than during the pre-collapse 
and recovering periods, respectively. The magnitude of this temper- 
ature change would have only minimal or no effect on individual and 
population growth rates as well as other life history traits (Methods). 
Further, the dominant large-scale atmospheric forcing mechanism in 
the western North Atlantic (that is, the North Atlantic Oscillation) 
induces a bimodal response of ocean temperatures with a nodal point 
in bottom temperature occurring in the middle of the eastern Scotian 
Shelf*’. Consequently, the temperature response to such forcing in this 
region is minimal and is reflected in the dampening of regional vari- 
ability in other biological properties such as species richness”°. 
Differences in water column temperature anomalies were also slight 
for the three periods: on average, temperatures during the pre-collapse 
and collapsed periods were within 0.1 °C, during the forage fish out- 
break temperatures were elevated by about 0.4 °C; overall, tempera- 
tures varied over a range of about 2°C. There was no relationship 
between water column temperatures and forage fish biomass at zero 
lag (Supplementary Fig. 5; correlation coefficient, r = 0.02) or for lags 
(forage fish biomass relative to temperature) up to four years 
(Methods). The minor increases in stratification occurred primarily 
during summer, outside of the peak period of phytoplankton produc- 
tion (Methods). The timing and magnitude of this ongoing recovery of 
the benthic fish complex was initiated and is being sustained by 
naturally induced changes in the dynamics of their former prey, and 
the resulting impact on the total ecosystem, more so than by external 
climatic influences (Fig. le). 

The second and most recent ecosystem transition began with a marked 
decline in the biomass of the unfished, forage fish complex dominated by 
northern sand lance (Ammodytes dubius), capelin (Mallotus villosus), 
and Atlantic herring (Clupea harengus). The aggregate biomass of these 
species peaked in 1994 and 1999 at approximately 10 million t, which 
exceeded the estimated carrying capacity of 4.3 million t for the eastern 
Scotian Shelf ecosystem (Fig. 1b; Methods). Subsequently, their total 
biomass rapidly declined at an average rate of 0.5 million t per year to 
current levels near 3 million t. Such eruptions followed by crashes invol- 
ving fast growing, highly opportunistic species are known to occur in 
other ecosystems freed from predatory control”!*. 

Physiological changes and cascading food web effects associated 
with the overshoot of the pelagic forage fish were evident. Relative 
weights, an index of physiological condition available since 1970, of 
the three dominant forage fish species showed coherent changes (31 
out of 41 years with same sign) throughout the entire period with 
relative high condition indices from the 1970s to the early 1990s fol- 
lowed by sustained declines beginning about 1994 (Fig. 2a). This 
points to the pelagic species having inadequate food resources at 
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Figure 2 | Physiological changes in forage fish species and resultant food 
web effects. a, Species-specific body weight anomalies (stacked histograms) 
and smoothed forage fish biomass (solid line; 30% LOESS; Supplementary 
Fig. 8). b, Time series of large-bodied zooplankton abundance and 
phytoplankton colour index (a measure of abundance) from the Continuous 
Plankton Recorder survey (http://www.safhos.ac.uk). Recent (2000-2007) 
increases in large-bodied zooplankton, coincident with major reductions in 
forage fish biomass shown in a, indicate a weakening of the inhibitory effects of 
the predator-prey reversal mechanism to benthic fish species recovery. 


increased abundances. The density of large-bodied zooplankton, 
which has varied inversely (r = —0.32, n = 22 years) with the forage 
fish biomass (their principal predators), reached a broad minimum of 
about 17 individuals per m° at the approximate peak (1994) of the 
forage group biomass—a signature of excessive grazing. Large-bodied 
zooplankton species increased rapidly (1997-2007) by a factor of four 
in six to seven years to levels observed before the forage fish eruption, 
the final year in the series being an exception (Fig. 2b). Moreover, the 
standing stock of phytoplankton, which has varied inversely with the 
abundance of large-bodied zooplankton (r = —0.71), declined by 40% 
during the ongoing recovery of the benthic fish complex. These recip- 
rocal relationships between adjacent trophic levels are consistent with 
the trophic cascade model". 

Most revealing is the fact that the trajectory of forage fish biomass 
changes following the benthic fish collapse approximates a damped 
harmonic oscillator (Fig. 3), providing key biological insights from 
the derived parameter values including estimates of the period, 5 years 
(the approximate life cycle of this functional group), and dampening 
time, 7 years. The 7-year dampening represented a 78% decrease in 
forage fish biomass between 1994, the time of the peak amplitude, 
and 2005, the end of the regime dominated by this group (Fig. 1d). 
Studies of terrestrial herbivores indicate that the cycle period is strongly 
dependent on body size”’; based on the range of body sizes of the forage 
fishes we studied (weight range: 0.04—0.3 kg), the cycle period is similar 
to those estimated from outbreaks of similar-sized terrestrial species (3.6 
to 5.9years)””. This dynamic, oscillatory behaviour indicates that the 
internal damping capacity of the ecosystem, and not solely management 
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Figure 3 | Post-perturbation forage fish variability. The variability (solid 
points + s.e.m., n = 27), described by a damped harmonic oscillator fitting the 
1994-2001 observations (solid line), gives an amplitude of 6,100 < 10°t, 
dampening rate = 6.9 years, and oscillation period = 5.2 years (Methods). 
Shaded area indicates period of greatest population growth. Aggregate prey 
biomass is expressed as an anomaly from the 30% LOESS filter (Supplementary 
Fig. 8). This solution is extrapolated to the end of the record (broken line). In 
theory”, a population or functional group that is characterized by damped 
oscillations will overshoot and then undershoot the carrying capacity. 


strategies, probably initiated the return of the eastern Scotian Shelf 
ecosystem towards its former structure and the restoration of food 
web stability. 

We propose that this crash in forage fish biomass led to a reduction 
in the intensity of predator-prey reversal which, combined with 
increased food availability to the benthic fish larval stages resulting 
from the corresponding and related growth in the abundance of large 
zooplankton, provided a window of opportunity for the recovery of the 
once dominant benthic predators. Since 2006 the benthic fish biomass 
has attained levels approaching those observed during the pre-collapse 
period (Fig. 1a). Atlantic cod and redfish (Sebastes spp.) have reached 
levels not seen since the early 1990s and haddock (Melanogrammus 
aeglefinus) to an unprecedented high (Supplementary Fig. 6). Enhanced 
recruitment (by a factor of 5.3, recovering/pre-collapse; Supplementary 
Fig. 7) and improvements in post-recruitment survivorship, which for 
cod and haddock has increased by factors of 12 and 70 times compared 
to the collapsed period, contributed to these changes (Supplementary 
Table 1). In addition, three of the four benthic fish species which are 
routinely aged (cod, pollock and silver hake) have shown 8-18% 
increases in average weight at age during 2006-2010 relative to the 
1992-2005 post-collapse period (Supplementary Table 2). 

The generally positive response of the large-bodied zooplankton to 
the declining forage fish biomass supports this hypothesis. The early 
life stages of most of the benthic fish species form an integral part of the 
large-bodied zooplankton complex and their survival and contribution 
to recruitment would have benefited from the same forces that have led 
to the increase in large-bodied zooplankton abundance noted above. 
Increased predation by the expanding benthic fish complex on the 
forage fish community should accelerate this trend. 

This unfolding drama held many surprises, including the prolonged 
recovery of the benthic predator complex, despite the moratorium on 
directed fishing for cod and haddock, the establishment ofa closed area 
on the western offshore banks that preceded the fishing moratorium”, 
and the promotion of new and experimental fisheries” to divert fishing 
effort away from the traditional species. Although the current trajectory 
is positive, several factors could alter ongoing ecosystem recovery. The 
current high levels of recruitment and survivorship of the benthic fish 
complex, if sustained, could accelerate the recovery. The current 
dominance of haddock over cod also raises the question of whether 
the species makeup of the ecosystem will return to that which prevailed 
before the collapse. Furthermore, recovery in other over-exploited 
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ecosystems such as the Black Sea, Northern Benguela, the Sea of 
Japan, and elsewhere has been delayed by jellyfish blooms”, the pres- 
ence of invasive species and by eutrophication’, all of which are pos- 
sible in the system we describe. The widespread body size reductions of 
benthic fishes documented for other exploited northwest Atlantic sys- 
tems*°”’, ifnot reversed, could also slow the recovery of the benthic fish 
complex and adversely affect food web structure’. The evolving global 
climate could alter the ecosystem positively or negatively. 

These uncertainties notwithstanding, the answer to the critical ques- 
tion of whether or not such profound changes in the dynamics of large 
marine ecosystems are reversible seems to be ‘yes’. This bodes well for 
other perturbed, formerly cod-dominated systems at latitudes to the 
north of the eastern Scotian Shelf that have yet to recover. Indeed, 
subtle signs of cod recovery have been appearing in other sub-arctic 
northwest Atlantic ecosystems during the past few years”*. However, 
the time scales for their recovery are likely to be greater given the lower 
water temperatures (equates to slower turnover times) and their 
reduced species richness and, for some, because of the continued 
exploitation of cod and other large-bodied benthic fish species'*!*"*. 


METHODS SUMMARY 


Annual fishery-independent, randomly stratified bottom trawl surveys on the con- 
tinental shelf off eastern Nova Scotia (1970-2010) during July-August provided 
biomass and variance estimates for functional groupings of fifteen, commercially 
exploited benthic and three forage fish species. A March survey (1986-2010) was 
used to assess the benthic fish dynamics further. Benthic fish exploitation levels 
were expressed as the ratio of landings to survey-estimated biomass. Ageing data, 
available for four benthic fish species, were used to calculate growth and mortality 
rates. Correction factors were applied to the typically under-sampled forage species. 
A 30% linear LOESS filter was applied to the forage fish biomass data to isolate high- 
frequency variability; the resultant anomalies were least squares fit to a damped 
harmonic oscillator equation (Methods). We quantified the time-averaged ecosys- 
tem carrying capacity of forage fishes using seasonally averaged, zooplankton data 
and production to biomass ratios obtained from the literature (Methods). Annual 
anomalies of the individual average body weights (total biomass/total abundance), 
an index of physiological condition, were derived for each forage species. Lower 
trophic level data (phytoplankton colour index and zooplankton abundance time 
series) were obtained from the Continuous Plankton Recorder survey (http:// 
www.safhos.ac.uk). Zooplankton were grouped into large (=2 mm carapace length) 
and small (<2 mm) species. The eastern Scotian Shelf ecosystem status was assessed 
by principal components analysis of the five biological time series. To determine 
whether and when ecosystem transitions occurred, the dominant principal com- 
ponent axis (PCA1) was subjected to a sequential t-test analysis of the regime shift 
method (STARS)”, which identifies the magnitude and direction of significant 
shifts. Temperature, salinity and density observations were obtained from directed 
and opportunistic ship-based sampling. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Large-bodied benthic predators, aggregate forage fish biomass, zooplankton 
functional groups and chlorophyll. The primary data used in the analyses were 
from July-August surveys, initiated in 1970, that are conducted annually by the 
Department of Fisheries and Oceans (DFO) of the eastern Scotian Shelf 
(Supplementary Fig. 1), Canada*’ >’. Results from a secondary, fishery-independent 
bottom trawl survey beginning in 1986 and conducted during the month of March 
were used to assess the benthic fish biomass dynamics further. We consider this a 
secondary survey because of its reduced geographic coverage, different statistical 
design, and three missing/incomplete years in comparison to the July-August 
survey. 

The dominant, large-bodied predator species, designated the predator group, are 
Atlantic cod Gadus morhua (Lymax = 161 cm), haddock Melanogrammus aeglefinus 
(87 cm), pollock Pollachius virens (112 cm), longfin hake Urophycis chesteri(71 cm), 
silver hake Merluccius bilinearis (73 cm), white hake Urophycis tenuis (142 cm), red 
hake Urophycis chuss (69 cm), redfish Sebastes spp. (60 cm), thorny skate Amblyraja 
radiata (120cm), spiny dogfish Squalus acanthias (196 cm), Greenland halibut 
Reinhardtius hippoglossoides (82 cm), American plaice Hippoglossoides platessoides 
(76cm), winter flounder Pseudopleuronectes americanus (64cm), witch flounder 
Glyptocephalus cynoglossus (67 cm), and yellowtail flounder Limanda ferruginea 
(59cm). These fifteen species have been commercially exploited, often in mixed 
fisheries, throughout the continental shelf in depths of less than 200 m. Ages are 
determined for Atlantic cod, haddock, pollock and silver hake; a growth model for 
redfish** applicable to the eastern Scotian Shelf stock and an age-length key for 
American plaice*® permitted approximations of abundance at age for these two 
species. The availability of age data permitted the estimation of total mortalities 
(Supplementary Table 1) and growth rates (Supplementary Table 2). 

The forage species group consists of three species: herring Clupea harengus, 
capelin Mallotus villosus, and northern sand lance Ammodytes dubius, which are 
under-sampled by the bottom trawl survey. Correction factors have been applied 
(capelin and sand lance by a factor of 200 and herring by 40 (ref. 37)). Commercial 
exploitation of these species has been relatively low (herring) or non-existent in 
this region. Annual anomalies (s.d. units) of individual average body weights (total 
biomass/total abundance) for each of the three forage species were estimated and 
used as indices of the temporal variation in physiological condition. 

Lower trophic level data (phytoplankton colour index and zooplankton abund- 
ance time series) were obtained from Continuous Plankton Recorder (CPR) obser- 
vations collected on the Scotian Shelf beginning in 1961 at a nominal sampling 
interval of 1 month. The CPR instrument is towed behind a vessel at about 7m 
depth; plankton are collected on a 270 uum silk mesh over a 10 nautical mile tow 
and stored within the recorder for later identification. Details of the sampling and 
analysis methods can be found on the Sir Alister Hardy Foundation website 
(http://www.sahfos.ac.uk/). 

The number of CPR samples per month by year is shown in Supplementary 
Table 3. The series consists of two periods of data collection, 1961-1976, when there 
were many months without samples, and 1991-2008, when most months were 
sampled. There was a large data gap from 1977-1990. To compensate for missing 
monthly data, a time series for the period 1961-1976 was created by averaging 
monthly values over successive 5-year periods. For the better-sampled 1991-2008 
period, 3-year averaging blocks were used. The averaging acts like a rough low-pass 
filter. The positions of all samples are shown in Supplementary Fig. 9. 

Zooplankton data obtained from the CPR Program were grouped into large- 
bodied (=2 mm carapace length) and small-bodied (<2 mm) species. The large- 
bodied category comprised all Calanus species and their copepodite stages including 
C. finmarchicus, C. hyperboreus, C. helgolandicus and C. glacialis; all Metridia, 
Euchaeta and Pleuromamma species including M. lucens, M. longa, E. marina, 
E. norvegica, P. abdominalis and P. robusta; also included were Candacia sp., 
C. armata, Heterorhabdus sp. and H. papilliger, and Neocalanus gracilis, 
Rhincalanus nasutus, Euchirella rostrata and Anomalocera patersoni. 

The small-bodied category included: all Centropages and Temora species includ- 
ing C. typicus, C. hamatus, C. bradyi, C. chierchiae, T. longicornis and T. stylifera; 
also Pseudocalanus, Candacia, Paracalanus, Oncaea, Sapphirina, Lucicutia, 
Scolecithricella, Clausocalanus and Calocalanus sp.; and Acartia sp., Acartia danae, 
Corycaeus sp. Ctenocalanus vanus, Mecynocera clausi, Nannocalanus minor, 
Pleuromamma sp., Pleuromamma borealis, Lucicutia and Tortanus discaudatus. 
Further details of the species identification protocols used are provided elsewhere’. 
Bottom trawl survey precision. The precision of the biomass estimates at both 
the functional group and species level was evaluated by using the random, strati- 
fied survey design (27 strata) to calculate the relative standard error (rse, standard 
error divided by the mean and expressed as a percentage)”. For the aggregate 
predator biomass, the rse (averaged over 41 years) equalled 17.5% (range: 8-40%). 
The rse for the annual forage fish biomass was averaged over a fewer number of 
surveys (1993 to 2010) because, before 1993, there were very low levels of forage 


fish with the majority of survey sets yielding null values (>70%). The resulting rse 
was 32% (range: 16-59%). The rse of the annual biomasses of the individual 
predator species are shown in Supplementary Fig. 6. The average value for cod- 
like species (35%) compares favourably to surveys in other geographic regions”. 
The forage species biomass estimates averaged 53, 45 and 38%, for capelin, northern 
sand lance and herring, respectively. 

Wealso considered the probability that the aggregate benthic fish biomass from 
the relatively constant period of 2000-2004 (168,000 t) was different from the 2010 
value (336,000 t) represented by a linear fit to the 2004-2010 observations when 
the biomass had an average increase of 28,000 t per year. From the analysis above, 
we took 19% as a representative rse and computed the probability that the 2000- 
2004 value differed from the 2010 best-fit value. The probability that the 2000- 
2004 average value was greater than or equal to the best-fit 2010 value was 0.009. 
Commercial landings and fishery regulations. Annual eastern Scotian Shelf 
landings for the benthic species were extracted from databases maintained by 
the Northwest Atlantic Fisheries Organization (NAFO) and the DFO. The 
management unit, NAFO Div. 4VW (Supplementary Fig. 1), goes beyond the 
continental shelf into offshore slope waters where fishing effort was concentrated 
for certain species, specifically silver hake. Therefore, estimation of meaningful 
exploitation levels based on the ratio of landings to biomass (derived from the 
fishery-independent trawl surveys that are constrained to the continental shelf) 
required the exclusion of the landings data for silver hake. No survey catchability 
corrections were made for any of the benthic fish species and this may result in 
annual ratios greater than 1. Hence, this measure of exploitation, often referred to 
as relative F (fishing mortality), is meant to serve as an index of commercial 
exploitation (Fig. 1c). There are no contemporary (post-2000) estimates of 
instantaneous fishing mortality rates because of the low levels of fishing associated 
with the cod and haddock fishing moratorium; for eleven species that were not 
aged there have never been any estimates of instantaneous fishing mortality rates. 

In September of 1993 the eastern Scotian Shelf fishery for cod and haddock was 
closed while fisheries for flatfish, skates, redfish, silver hake (omitted) and some 
other minor species have remained open and, since then, total landings have 
averaged ~6,000t. For silver hake, landings averaged 16,000t during this same 
period and the percentage of cod by-catch associated with this directed fishery was 
very low (0.01%). The second leading fishery in this area was for redfish (4,500 t) 
and it incurred by-catches of cod of 0.8%. This information is available at http:// 
www.dfo-mpo.gc.ca/fm-gp/initiatives/cod-morue/strategic-mar-eng.htm. 
Principal components analysis and regime shift detection. The status of the 
eastern Scotian Shelf ecosystem was characterized by principal components ana- 
lysis based on standardized anomalies (s.d. units) from five biological time series 
(CPR phytoplankton colour index, zooplankton (body size < and = 2 mm), forage 
fish and large benthic predators) using a method previously described". PCA1 and 
2 accounted for 56.3% and 18% of the variance, respectively. We compared the 
time series variability of PCA1 constructed from the correlation matrices based on 
the data series from 1970-2010, which include data gaps for the three lower 
trophic levels from 1975-1991 and 2008-2009 and from the period (1970-74; 
1992-2007) when all series were complete. The r° between the two PCA1 
series was 0.99. The loadings based on the 1970-2009 (1970-1974; 1992-2008) 
series of the five variables on PCA1 were: predators = 0.49 (0.50), forage fish 
complex = — 0.44 (—0.46), large-bodied zooplankton = 0.43 (0.43), small-bodied 
zooplankton = —0.33 (—0.32), phytoplankton colour index = —0.52 (—0.51). 

To objectively determine whether and when ecosystem transitions occurred, the 
dominant principal component axis (PCA1) from the five-variable biological 
analysis was subjected to a sequential t-test analysis of regime shift method 
(STARS) that identifies the magnitude and direction of shifts significant at a 
pre-determined «-level, given both the expected cut-off length for the regime 
(L) and a parameter that designates how much of a difference from the observed 
mean (in s.d. units) are required before data are considered outliers” (available at 
http://www.beringclimate.noaa.gov/regimes/index.html). Choice of L and o affect 
the number and duration of regime shifts”. We set L = 5 and « = 0.01, in keeping 
with previous analyses using a subset of ecosystem data where shifts in Scotian 
Shelf cod-prey states were shown to be relatively insensitive to changes in L and H 
(a parameter describing the treatment of outliers) and where the relatively strin- 
gent a-value was established to limit regime shift detection to only cases with 
strong evidence”. In our analyses, L = 10, L = 15 did not affect the timing of 
the biological regime shifts; setting « = 0.05 only suggested one additional small 
regime shift in 1975 when L = 5. 

Carrying capacity estimation. Estimates of the zooplankton biomass on the 
eastern Scotian Shelf have been compiled*’; additional stations were available 
through the BIOCHEM database (http://www.meds-sdmm.dfo-mpo.gc.ca/ 
biochem/Biochem_e.htm) and courtesy of C. Johnson and B. Casault. Biomass 
estimates were made using vertical hauls from the bottom to the surface using 
200m mesh nets. Observations from 1999-2008 provided reasonable spatial 
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coverage for the months of March, April, July and October. March and July 
observations were made during ecosystem fisheries surveys; April and October 
data, from Atlantic Zone Monitoring Program cruises, were collected on three 
standard, repeated sections at the northeastern (Cabot Strait Section), central 
(Louisbourg Section) and southwestern end (Halifax Section) of NAFO Div. 
4VW (Supplementary Fig. 10). 

The zooplankton biomasses are shown in Supplementary Table 4. Taking the 
simple average of these observations and multiplying by the surface area of the 
eastern shelf gives a zooplankton standing crop of 4.8 X 10°t. Previous work"! 
estimated a zooplankton production to biomass ratio of 6 to 9.7, and used an 
average value of 7 as an overall factor applied to zooplankton as a group. Another 
more rigorous, quantitative analysis*’, estimated production to biomass (P/B) 
ratios for nine species, including leading components of the overall biomass, 
and for a single general category. Their biomass-weighted P/B ratio was 8.9. 
Multiplying the average biomass in Supplementary Table 4 by this factor yields 
an overall zooplankton production of approximately 4.3 X 10’ t. Using a 10% 
efficiency relating the zooplankton production to forage fish yields a rough estimate 
of the forage fish carrying capacity of 4.3 X 10°t. 

Similar to our calculation of carrying capacity for forage species, we estimated 
the annual production of phytoplankton on the eastern Scotian Shelf based on 
measurements from three sources***. The data from two sources**” covered the 
months of March—August and November; the observations from the third’” con- 
sisted of 13 monthly estimates from March 1991 to March 1992. All observations 
were reported as mg of carbon per m per hour. We converted mg C to wet weight 
of phytoplankton using a factor of 42 (ref. 49), thus allowing direct comparison 
with our estimates of zooplankton production. Combining two sets**”” of results, 
we estimate the annual production of phytoplankton as 640 X 10°t and 
240 X 10°t, respectively. These give ratios of phytoplankton to zooplankton pro- 
duction of 15 and 5.5 which are reasonable if the energy transfer efficiency is 
~10%. One crude measure for estimating the carrying capacity for the predator 
complex involved taking the ratio of peak biomasses of benthic fish predators to 
forage fishes which is about 1:16. 

Damped harmonic oscillator calculations. The input data set was the biomass of 
the three leading forage species. The period under consideration was 1994-2010. 
We used a temporally varying background state derived by running a 30% linear 
LOESS filter through the data. The filter split the variability into two components: 
a very low frequency component with a period estimated as 56 years from the 
autocorrelation function and higher frequency variability (Supplementary Fig. 8). 

The anomalies indicated a simple damped harmonic oscillator-like (dSHO) 
variability, particularly from 1994 to 2001, shown as solid grey circles in Fig. 3. 
We solved for the characteristics of this variability by least squares fitting the 
observations to the dSHO equation: 


Ag +Al—4) cos (2nt/t), 


where Ap is the mean, A; the amplitude of the harmonic oscillation, d, the dam- 
pening rate, and t the periodicity of the oscillation. The solution based on optimal 
fitting of 1994-2001 observations, shown as a solid red line in Fig. 3, gives 
Ao = 347,000 t, A; = 6,100,000 t, 1/d; = 6.9 years, t = 5.2 years. 

Environmental indices and their relationship to the pre-collapse, collapse and 
post-collapse periods. The relationship between forage fish biomass and water 
column temperatures was evaluated. The results showed no relationship between 
the two variables at 0 lag (Supplementary Fig. 5). We also lagged the forage fish 
biomass relative to the temperature; for lags up to 4 years, the r° was less than 0.04. 
At lags of 5 years, ° = 0.21; however, this is approaching the life cycle of these 
species and the biological import of the enhanced correlation at this lag is therefore 
questionable. Finally, we examined integrated temperatures up to 5 years. All 1 
values were less than 0.02. 

An examination of the seasonal variability of the stratification index showed 
that the long-term trend seen in the annual data was due mostly to the summer and 
secondarily to the fall series (Supplementary Fig. 11). Moreover, the magnitude of 
the changes during winter and spring (major bloom period) were small compared 
to those in summer and fall. The period, 1985-1994, of ongoing decline of the 
benthic fish functional group showed near-normal values of the stratification 
parameter during the spring bloom period. The outbreak period (1994-2001) of 
the forage group was weakly (1° = 0.21) related (not significantly) to the spring 
stratification. 

The seasonal variation of the phytoplankton colour index (Supplementary Fig. 
12) and their correlations with the annual index, their averages and deviations 
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(Supplementary Table 5) indicated that the greatest contributions were from the 
winter and spring seasons (high correlations and variance), followed by fall (high 
correlation, moderate variance), and finally by summer (low correlation, lowest 
variance). This indicated that summer, with the greatest stratification changes, had 
the least impact on the annual colour index; fall, with the next greatest impact on 
stratification, made the second least contribution to the index. Further compar- 
isons of the monthly colour indexes with the annual revealed that the greatest 
contributions were from March and April, typical months for the spring bloom on 
the eastern Scotian Shelf (Supplementary Table 6). 

Weassessed the impact of the full range of temperatures observed in the eastern 
Scotian Shelf using the relationship between the intrinsic rate of population 
growth, r, age at maturity, a, and bottom temperatures developed previously for 
20 stocks of North Atlantic cod*’. This analysis revealed that the estimates of rand 
a so derived were, during the pre-collapse period, on average +4% higher and 
—3% lower, respectively, than during the collapsed period and +3% higher and 
—2% lower, respectively, for the recovering period. We also note that the period 
1987-1991, immediately preceding the cod collapse, featured some of the coldest 
bottom water temperatures on record. Again applying the approach detailed 
above, and the r and a data provided”, we estimate a maximum 10% decrease 
in r and a 9% increase in a relative to the collapsed and post-collapse intervals. 
During this cold period, fishing mortality on cod, based on the ratio of landings to 
biomass, averaged 65%, which is an extremely high annual rate of biomass 
removal. This dwarfs any possible impact of the environment as expressed through 
the effect of temperature on r and a. 

The population dynamics and feeding ecology of grey seals on the eastern 
Scotian Shelf has been assessed since the early 1960s with sampling effort con- 
centrated on the colony inhabiting Sable Island. Total population sizes were esti- 
mated from a model fit to census data on pup production (Supplementary Fig. 13). 
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In the central nervous system, ageing results in a precipitous decline 
in adult neural stem/progenitor cells and neurogenesis, with con- 
comitant impairments in cognitive functions’. Interestingly, such 
impairments can be ameliorated through systemic perturbations 
such as exercise’. Here, using heterochronic parabiosis we show that 
blood-borne factors present in the systemic milieu can inhibit or 
promote adult neurogenesis in an age-dependent fashion in mice. 
Accordingly, exposing a young mouse to an old systemic environ- 
ment or to plasma from old mice decreased synaptic plasticity, and 
impaired contextual fear conditioning and spatial learning and 
memory. We identify chemokines—including CCL11 (also known 
as eotaxin)—the plasma levels of which correlate with reduced 
neurogenesis in heterochronic parabionts and aged mice, and the 
levels of which are increased in the plasma and cerebrospinal fluid of 
healthy ageing humans. Lastly, increasing peripheral CCL11 
chemokine levels in vivo in young mice decreased adult neuro- 
genesis and impaired learning and memory. Together our data 
indicate that the decline in neurogenesis and cognitive impairments 
observed during ageing can be in part attributed to changes in 
blood-borne factors. 

Adult neurogenesis occurs in local microenvironments, or neuro- 
genic niches, in the subventricular zone (SVZ) of the lateral ventricles 
and the subgranular zone (SGZ) of the hippocampus*’. Permissive 
cues within the neurogenic niche are thought to drive the production 
of new neurons and their subsequent integration into the neurocircui- 
try of the brain*”, directly contributing to cognitive processes includ- 
ing learning and memory*”. Importantly, the neurogenic niche is 
localized around blood vessels’®"', allowing for potential communica- 
tion with the systemic environment. Therefore, the possibility arises 
that diminished neurogenesis during ageing may be modulated by the 
balance of two independent forces: intrinsic central nervous system 
(CNS)-derived cues'*“* and cues extrinsic to the CNS delivered by blood. 
Thus we hypothesized that age-related systemic molecular changes could 
cause a decline in neurogenesis and impair cognitive function during 
ageing. 

We first characterized cellular, electrophysiological and behavioural 
changes associated with the neurogenic niche in the dentate gyrus of 
the hippocampus in an ageing cohort of mice. We observed cellular 
changes consistent with markedly decreased adult neurogenesis’ and 
increased neuroinflammation with age’ (Supplementary Fig. 2a-e). 
Additionally, we detected deficits in synaptic plasticity (Supplemen- 
tary Fig. 3a-c), and behavioural deficits in contextual fear conditioning 


(Supplementary Fig. 4a-c) and radial arm water maze (RAWM; 
Supplementary Fig. 4d-f) paradigms in old animals, consistent with 
decreased cognitive function during ageing”®. 

Next we investigated the contribution of peripheral systemic factors 
to the age-related decline in neurogenesis in the dentate gyrus of the 
hippocampus in the setting of isochronic (young-young and old-old) 
and heterochronic (young-old) parabiosis (Fig. 1a). Remarkably, the 
number of doublecortin (Dcx)-positive newly born neurons (Fig. 1b, c), 
BrdU-positive cells (Fig. le, f) and Sox2-positive progenitors (Sup- 
plementary Fig. 5a, b) decreased in young heterochronic parabionts. 
In contrast, we observed an increase in the number of Dcx-positive 
(Fig. 1b, d), BrdU-positive (Fig. le, g) and Sox2-positive (Supplemen- 
tary Fig. 5a, c) cells in the old heterochronic parabionts. The number of 
Dcx-positive neurons between unpaired age-matched animals and iso- 
chronic animals showed no difference (Supplementary Fig. 6a, b). Asa 
control, flow cytometry analysis confirmed a shared vasculature in a 
subset of parabiotic pairs, in which one parabiont was transgenic for 
green fluorescent protein (GFP; Supplementary Fig. 7a—d). Together 
our findings suggest that global age-dependent systemic changes can 
modulate neurogenesis in both the young and aged neurogenic niche, 
potentially contributing to the decline in regenerative capacity observed 
in the normal ageing brain. 

As previously reported by others’, we rarely detected peripherally 
derived GFP cells in the CNS of wild-type mice, and these numbers did 
not differ between isochronic and heterochronic pairings (Supplemen- 
tary Fig. 7e), suggesting that the observed effects are mediated by 
soluble factors in plasma. To confirm that circulating factors within 
aged blood contribute to reduced neurogenesis with age, we intrave- 
nously injected plasma isolated from young or old mice into young 
adult mice (Fig. 2a). The number of Dcx-positive cells in the dentate 
gyrus decreased in animals receiving old plasma compared to animals 
receiving young plasma (Fig. 2b, c), indicating that soluble factors 
present in old blood inhibit adult neurogenesis. 

To investigate the functional effect of the ageing systemic milieu, 
extracellular electrophysiological recordings were done on hippocampal 
slices prepared from young isochronic and heterochronic parabionts 
(Fig. 1h, iand Supplementary Fig. 6c). A decrease in long-term potentia- 
tion (LTP) in the dentate gyrus of heterochronic parabionts was detected 
(Fig. 1h, i). These data indicate that age-related systemic changes can 
elicit deficits in synaptic plasticity. As LTP is considered to be a correlate 
of learning and memory’, these finding suggest that age-related 
systemic changes may also affect cognitive functions during ageing. 
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Figure 1 | Heterochronic parabiosis alters neurogenesis in an age- 
dependent fashion. a, Schematic showing parabiotic pairings. 

b, e, Representative fields of Dcx (b) and BrdU (e) immunostaining of young 
(3-4 months; yellow) and old (18-20 months; grey) isochronic and 
heterochronic parabionts 5 weeks after parabiosis (arrowheads point to 
individual cells; scale bars, 100 jim). c-f, Quantification of neurogenesis 

(c, d) and proliferating cells (e, f) in the young (c, e; top) and old (d, f; bottom) 
dentate gyrus (DG) after parabiosis. Data from 12 young isochronic, 10 young 
heterochronic, 6 old isochronic and 12 old heterochronic parabionts. 

g, h, Population spike amplitude (PSA) was recorded from the dentate gyrus of 
young parabionts. Representative electrophysiological profiles (g) and LTP 
levels (h) are shown for young heterochronic and isochronic parabionts. Data 
from 4-5 mice per group. All are data represented as mean + s.e.m.; *P < 0.05; 
**P << 0.01, t-test. 
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Figure 2 | Factors from an old systemic environment decrease neurogenesis 
and impair learning and memory. a, Schematic of young (3-4 months) or old 
(18-22 months) plasma extraction and intravenous (i.v.) injection into young 
(3 months) adult mice. b, Representative field of Dcx immunostaining of young 
adult mice after plasma injection treatment four times over 10 days (scale bar, 
100 jm). c, Quantification of neurogenesis in the young dentate gyrus after 
plasma injection. Data from 8 mice injected with young plasma and 7 mice 
injected with old plasma. d, e, Hippocampal learning and memory assessed by 
contextual fear conditioning (d) and RAWM (e) paradigms in young adult 
mice after young or old plasma injections nine times over 24 days. d, Percent 
freezing time 24h after training. Data from 8 mice per group. e, Number of 
entry arm errors before finding platform. Data from 12 mice per group. All data 
represented as mean ~ s.e.m.; *P < 0.05; **P < 0.01, t-test (c, d), repeated 
measures ANOVA, Bonferroni post-hoc test (e). 
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Subsequently, we tested hippocampal-dependent learning and 
memory using contextual fear conditioning and RAWM paradigms 
in young adult mice intravenously injected with young or old plasma 
(Fig. 2d, e). During fear conditioning training mice exhibited no dif- 
ferences in baseline freezing regardless of plasma injection treatment 
(Supplementary Fig. 8a). However, mice receiving old plasma demon- 
strated decreased freezing in contextual (Fig. 2d), but not cued 
(Supplementary Fig. 8b), memory testing. During the training phase 
of the RAWM all mice showed similar swim speeds (Supplementary 
Fig. 8c) and spatial learning capacity for the task (Fig. 2e). However, 
during the testing phase animals that had received old plasma demon- 
strated impaired learning and memory for platform location (Fig. 2e). 
Asa control, we tested the RAWM paradigm in young adult mice with 
ablated hippocampal neurogenesis and observed corresponding beha- 
vioural deficits (Supplementary Fig. 9a—e). Collectively, these data 
indicate that factors present in ageing blood inhibit adult neurogenesis, 
and functionally contribute to impairments in cognition. 

Previous pioneering studies focused on muscle have shown that 
exposure of the aged stem cell niche to a young systemic environment 
through heterochronic parabiosis results in increased regeneration after 
muscle injury’”, in part by involving Notch signalling”*. However, indi- 
vidual systemic factors associated with ageing and tissue degeneration 
have not yet been characterized or investigated for their role in regulat- 
ing the decline in tissue regeneration. To identify such factors, we used a 
proteomic approach in which relative levels of 66 cytokines, chemokines 
and other secreted signalling proteins were measured in the plasma of 
normal ageing mice using standardized multiplex sandwich enzyme- 
linked immunosorbent assays (ELISAs; Luminex) (Supplementary 
Table 1). Using multivariate analysis, we identified seventeen proteins 
whose levels increased and correlated with decreased neurogenesis 
during ageing (Fig. 3a and Supplementary Fig. 10a, b). To identify 
systemic factors associated with heterochronic parabiosis, we analysed 
plasma samples from young and old mice before and after pairings in an 
independent proteomic screen. Comparison of young isochronic and 
heterochronic cohorts identified fifteen factors that increased in hetero- 
chronic parabionts (Fig. 3a and Supplementary Fig. 10c), whereas com- 
parison between old isochronic and heterochronic cohorts revealed four 
factors that decreased in isochronic parabionts (Supplementary Fig. 10c). 
Interestingly, only six factors—CCL2, CCL11, CCL12, CCL19, hapto- 
globin and {2-microglobulin—were elevated in old unpaired and 
young heterochronic cohorts (Fig. 3a). Of these, CCL11 is a chemokine 
involved in allergic responses and not previously linked to ageing, 
neurogenesis or cognition. Relative levels of CCL11 were increased 
in the plasma of mice during normal ageing (Fig. 3b) and in young 
mice during heterochronic parabiosis (Fig. 3c). Furthermore, we 
detected an age-related increase in CCL11 in plasma (Fig. 3d) and 
cerebrospinal fluid (CSF; Fig. 3e), from healthy human individuals 
between 20 and 90 years of age, suggesting that this age-related sys- 
temic increase is conserved across species. 

Having identified CCL11 as an age-related systemic factor asso- 
ciated with decreased neurogenesis, we tested its potential biological 
relevance in vivo. We administered CCL11 protein through intra- 
peritoneal injections into young adult Dcx-luciferase reporter mice”, 
and using a non-invasive bioluminescent imaging assay detected a 
significant decrease in neurogenesis (Supplementary Fig. 11b, c). Using 
immunohistochemical analysis we investigated the effect of systemic 
CCL11 on adult hippocampal neurogenesis in young adult wild-type 
mice. We administered CCL11 or vehicle alone, and in combination 
with either an anti-CCL11 neutralizing antibody or an isotype control 
antibody through intraperitoneal injections (Fig. 4a). Systemic admini- 
stration of CCL11 induced an increase in CCL11 plasma levels 
(Supplementary Fig. 11a), and significantly decreased the number of 
Dcx-positive cells in the dentate gyrus (Fig. 4b, c). Importantly, this 
decrease in neurogenesis could be rescued by systemic neutralization of 
CCL11 (Fig. 4b, c). Likewise, BrdU-positive cells also showed similar 
changes in cell number (Supplementary Fig. 11d, e), and furthermore 
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Figure 3 | Systemic chemokine levels increase during ageing and 
heterochronic parabiosis, and correlate with decreased neurogenesis. 

a, Venn diagram of results from ageing and parabiosis proteomic screens. In 
grey are shown the seventeen age-related plasma factors that correlated most 
strongly with decreased neurogenesis, in red are shown the fifteen plasma 
factors that increased between young isochronic and young heterochronic 
parabionts, and in the brown intersection are the six factors elevated in both 
screens. Data from 5-6 mice per age group. b, c, Changes in plasma 
concentrations of CCL11 with age (b) and young heterochronic parabionts pre- 
and post- parabiotic pairing (c). d, e, Changes in plasma (d; r = 0.40; 

P=5.6 X10 7; 95% confidence interval = 0.26-0.53) and CSF 

(e) concentrations of CCL11 with age in healthy human subjects. All data 
represented as dot plots with mean; *P < 0.05; **P < 0.01; ***P < 0.001, t-test 
(c, e), ANOVA, Tukey’s post-hoc test (a, b), and Mann-Whitney U Test (d). 


the percentage of cells expressing both BrdU and NeuN decreased 
(Supplementary Fig. 11f, g). The percentage of cells expressing BrdU 
and GFAP did not significantly change (Supplementary Fig. 11f, h). As 
a negative control we assayed neurogenesis after systemic administra- 
tion of monocyte colony stimulating factor (MCSF), a measured 
protein whose plasma levels do not change with age, and detected no 
change in Dcx-positive cells in the dentate gyrus (Supplementary Fig. 
12a-d). Together, these data indicate that increasing the systemic level 
of the age-related factor CCL11 is sufficient to partially recapitulate 
some of the inhibitory effects observed with ageing and heterochronic 
parabiosis. 

Additionally, we investigated the possibility that age-related blood- 
borne factors influence neural progenitor activity and neural differ- 
entiation in vitro. We assayed the number of neurospheres formed 
after exposure of primary neural stem/progenitor cells (NPCs) to 
serum from aged mice and observed a 50% decrease when compared 
to exposure to serum from young mice (Supplementary Fig. 13a). We 
then tested the effect of CCL11 and observed that the number and size 
of neurospheres formed from primary NPCs exposed to CCLI11 sig- 
nificantly decreased (Supplementary Fig. 13b-d). Using a human- 
derived NTERA cell line expressing enhanced (e)GFP under the Dcx 
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Figure 4 | Systemic exposure to CCL11 inhibits neurogenesis and impairs 
learning and memory. a, Schematic of young (3-4 months) mice injected 
intraperitoneally with CCL11 or vehicle, and in combination with anti-CCL11 
neutralizing or isotype control antibody (Ab). b, Representative field of Dcx- 
positive cells for each treatment group (n = 6-10 mice) treated four times over 
10 days. i.p., intraperitoneal. Scale bar, 100 um. c, Quantification of neurogenesis 
in the dentate gyrus after treatment. d, Schematic of young adult mice given 
unilateral stereotaxic injections of anti-CCL11 neutralizing or isotype control 
antibody followed by systemic injections with either recombinant CCL11 or PBS 


promoter, we assayed neural differentiation and observed a significant 
decrease in eGFP expression after 12 days in culture with CCL11 
(Supplementary Fig. 13e, f). Although these findings open the possibility 
of a direct interaction of systemic factors with progenitor cells in vivo 
during ageing, they do not preclude the possibility of indirect actions 
by interactions with other neurogenic niche cell types. 

To examine the effect of CCL11 on neurogenesis in the brain we 
stereotaxically injected CCL11 into the dentate gyrus of young adult 
mice, and observed a decrease in the number of Dcx-positive cells 
compared with the contralateral dentate gyrus, which received vehicle 
control (Supplementary Fig. 14a, b). Furthermore, we examined 
whether the inhibitory effect of peripheral CCL11 could be restored 
locally within the hippocampus. We stereotaxically injected CCL11- 
specific neutralizing antibody and isotype control antibodies into the 
contralateral dentate gyrus of young adult mice (Fig. 4d). After stereo- 
taxic injection, we systemically administered CCL11 or vehicle control 


(vehicle). e, Representative field of Dcx-positive cells in adjacent sides of the 
dentate gyrus for each treatment group (n = 3-11 mice). Scale bar, 100 jim. 

f, Quantification of neurogenesis in the dentate gyrus after systemic and 
stereotaxic treatment. Bars represent mean number of cells in each section. 

g, h, Learning and memory assessed by contextual fear conditioning (g) and 
RAWM (h) paradigms in young adult mice injected with CCL11 or vehicle every 
3 days for 5 weeks (n = 12-16 mice per group). All data are represented as 
mean + s.e.m.; *P < 0.05; **P < 0.01; ANOVA, Dunnet’s or Tukey’s post-hoc 
test (c, f); repeated measures ANOVA, Bonferroni post-hoc test (k). 


by intraperitoneal injections (Fig. 4d). The decrease in Dcx-positive 
cell number observed in mice that received systemic CCL11 could be 
rescued by neutralizing CCL11 within the dentate gyrus (Fig. 4e, f), 
suggesting that the increase in systemic chemokine levels exerts a 
direct effect in the CNS. 

Lastly, to determine the physiological relevance of increased sys- 
temic CCL11 levels in mice we assessed hippocampal-dependent learn- 
ing and memory using contextual fear conditioning and RAWM 
paradigms (Fig. 4g, h). Young adult mice received intraperitoneal injec- 
tions of recombinant CCL11 or vehicle control. During fear condition- 
ing training all mice, regardless of treatment, exhibited no differences in 
baseline freezing (Supplementary Fig. 15a). In contrast, mice that had 
received CCL11 demonstrated decreased freezing during contextual 
(Fig. 4g), but not cued (Supplementary Fig. 15b), memory testing. 
During the training phase of the RAWM task all mice regardless of 
treatment showed similar swim speeds (Supplementary Fig. 15c) and 
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learning capacity for the task (Fig. 4h). However, by the end of the 
testing phase animals that had received CCL11 exhibited impaired 
learning and memory deficits (Fig. 4h). Together, these functional data 
demonstrate that increasing the systemic level of CCL11 can not only 
inhibit adult neurogenesis, but also impair learning and memory. 

Cumulatively, our data link age-related molecular changes in the 
systemic milieu to the age-related decline in adult neurogenesis, and 
impairments in synaptic plasticity and cognitive function observed 
during ageing (Supplementary Fig. 1). Whereas local immune signal- 
ling in the brain is emerging as a critical modulator of NPC func- 
tion'’"*?!? and neurodegeneration''’**’, we now identify systemic 
immune-related factors as potentially critical contributors to the sus- 
ceptibility of the ageing brain to cognitive impairments. Interestingly, 
members of the identified age-related chemokines (CCL2, CCL11 and 
CCL12) are localized to within 70 kb on mouse chromosome 11, and 
within 40 kb on human chromosome 17, implicating this genetic locus 
in normal brain ageing and possibly ageing in general. Indeed, work 
investigating cellular senescence, a known hallmark of ageing, further 
suggests the involvement of some of the individual systemic chemo- 
kines reported here (CCL2) in the ageing process as components of the 
senescence-associated secretory phenotype”. Lastly, although the 
proteomic platform we used here was sufficient to identify systemic 
inhibitory ‘ageing’ factors it will be critical to develop and utilize 
broader proteomic screens to facilitate the discovery of systemic 
pro-neurogenic ‘rejuvenating’ factors with the potential to ameliorate 
age-related cognitive dysfunction. 


METHODS SUMMARY 


Mouse strains used were C57BL/6 (Jackson Laboratory), C57BL/6 aged mice 
(National Institutes of Ageing), Dcx-Luc*® and C57BL/6J-Act-GFP (Jackson 
Laboratory). All animal use was in accordance with institutional guidelines 
approved by the Veterans Affairs Palo Alto Committee on Animal Research. 
Parabiosis surgery followed previously described procedures’? with the addition 
that peritonea between animals were surgically connected. Immunohistochemistry 
followed standard published techniques”. Extracellular electrophysiology was per- 
formed as previously described”. Spatial learning and memory was assayed with the 
RAWM paradigm as previously published’. Contextual fear conditioning was 
assayed as previously published’’. Relative plasma concentrations of cytokines 
and signalling molecules in mice and humans were measured using antibody-based 
multiplex immunoassays at Rules Based Medicine. Statistical analysis was per- 
formed with Prism 5.0 software (GraphPad Software). Plasma protein correlations 
were analysed with the Significance Analysis of Microarray software (SAM 3.00 
algorithm; _http://www.stat.stanford.edu/~tibs/SAM/index.htm). Experiments 
were carried out by investigators blinded to the treatment of animals. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mice. The following mouse lines were used: C57BL/6 (The Jackson Laboratory), 
C57BL/6 aged mice (National Institutes of Ageing), Dcx-Luc mice” and C57BL/ 
6J-Act-GEP (Jackson Laboratory). For parabiosis experiments male and female 
C57BL/6 mouse cohorts were used. For all other in vivo pharmacological and 
behavioural studies young (2-3 months) wild-type C57BL/6 male mice were used. 
Mice were housed under specific pathogen-free conditions under a 12 h light-dark 
cycle and all animal handling and use was in accordance with institutional guide- 
lines approved by the Veterans Affairs Palo Alto Committee on Animal Research. 
Immunohistochemistry. Tissue processing and immunohistochemistry was per- 
formed on free-floating sections following standard published techniques™. Briefly, 
mice were anaesthetized with 400 mgkg * chloral hydrate (Sigma-Aldrich) and 
transcardially perfused with 0.9% saline. Brains were removed and fixed in 
phosphate-buffered 4% paraformaldehyde, pH 7.4, at 4°C for 48h before they 
were sunk through 30% sucrose for cryoprotection. Brains were then sectioned 
coronally at 40 um with a cryomicrotome (Leica Camera) and stored in cryo- 
protective medium. Primary antibodies were: goat anti-Dcx (1:500; Santa Cruz 
Biotechnology), rat anti-BrdU (1:5,000; Accurate Chemical and Scientific Corp.), 
goat anti-Sox2 (1:200; Santa Cruz), mouse anti-NeuN (1:1,000; Chemicon), mouse 
anti-GFAP (1:1,500; DAKO) and mouse anti-CD68 (1:50; Serotec). After overnight 
incubation, primary antibody staining was revealed using biotinylated secondary 
antibodies and the ABC kit (Vector) with diaminobenzidine (DAB; Sigma-Aldrich) 
or fluorescence-conjugated secondary antibodies. For BrdU labelling, brain sections 
were pre-treated with 2 N HC] at 37 °C for 30 min before incubation with primary 
antibody. For double-label immunofluorescence of BrdU/NeuN or BrdU/GFAP, 
sections were incubated overnight with rat anti-BrdU, rinsed and incubated for 1h 
with donkey anti-rat antibody (2.5 jug ml '; Vector) before they were stained with 
mouse anti-NeuN antibody. To estimate the total number of Dcx- or Sox2-positive 
cells per dentate gyrus immunopositive cells in the granule cell and subgranular cell 
layer of the dentate gyrus were counted in every sixth coronal hemibrain section 
through the hippocampus and multiplied by 12. 

BrdU administration and quantification of BrdU-positive cells. 50 mg kg * of 
BrdU was injected intraperitoneally into mice once a day for 6 days, and mice were 
killed 28 days later or injected daily for 3 days before being killed. To estimate the 
total number of BrdU-positive cells in the brain, we performed DAB staining for 
BrdU on every sixth hemibrain section. The number of BrdU-positive cells in the 
granule cell and subgranular cell layer of the dentate gyrus were counted and 
multiplied by 12 to estimate the total number of BrdU-positive cells in the entire 
dentate gyrus. To determine the fate of dividing cells a total of 200 BrdU-positive 
cells across 4-6 sections per mouse were analysed by confocal microscopy for co- 
expression with NeuN and GFAP. The number of double-positive cells was 
expressed as a percentage of BrdU-positive cells. 

Parabiosis and flow cytometry. Parabiosis surgery followed previously described 
procedures’. Pairs of mice were anaesthetized and prepared for surgery. Mirror- 
image incisions at the left and right flanks, respectively, were made through the 
skin. Shorter incisions were made through the abdominal wall. The peritoneal 
openings of the adjacent parabionts were sutured together. Elbow and knee joints 
from each parabiont were sutured together and the skin of each mouse was stapled 
(9-mm autoclip, Clay Adams) to the skin of the adjacent parabiont. Each mouse 
was injected subcutaneously with Baytril antibiotic and Buprenex as directed for 
pain and monitored during recovery. Flow cytometric analysis was done on fixed 
and permeabilized blood plasma cells from GFP and non-GFP parabionts. 
Approximately 40-60% of cells in the blood of either parabiont were GFP-positive 
2 weeks after parabiosis surgery. We observed 70-80% survival rate in parabionts 
5 weeks after parabiosis surgery. 

Extracellular electrophysiology. Acute hippocampal slices (400-tm thick) were 
prepared from unpaired and young parabionts. Slices were maintained in artificial 
cerebrospinal fluid (ACSF) continuously oxygenated with 5% CO/95% Oz. ACSF 
composition was as follows: (in mM): NaCl 124.0; KC1 2.5; KHPOx 1.2; CaCl, 2.4; 
MgSO, 1.3; NaHCO; 26.0; glucose 10.0 (pH 7.4). Recordings were performed with 
an Axopatch- 2B amplifier and pClamp 10.2 software (Axon Instruments). 
Submerged slices were continuously perfused with oxygenated ACSF at a flow 
rate of 2ml min! from a reservoir by gravity feeding. Field potential (population 
spikes and EPSPs) was recorded using glass microelectrodes filled with ACSF 
(resistance: 4-8 MQ). Biphasic current pulses (0.2 ms duration for one phase, 
0.4ms in total) were delivered in 10-s intervals through a concentric bipolar 
stimulating electrode (FHC). No obvious synaptic depression or facilitation was 
observed with this frequency stimulation. To record field population spikes in the 
dentate gyrus, the recording electrode was placed in the lateral or medial side of the 
dorsal part of the dentate gyrus. The stimulating electrode was placed right above 
the hippocampal fissure to stimulate the perforant pathway fibres. Signals were 
filtered at 1 KHz and digitized at 10 KHz. Tetanic stimulation consisted of 2 trains 
of 100 pulses (0.4 ms pulse duration, 100 Hz) delivered with an inter-train interval 
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of 5s. The amplitude of the population spike was measured from the initial phase 
of the negative wave. Up to five consecutive traces were averaged for each mea- 
surement. Synaptic transmission was assessed by generating input-output curves, 
with stimulus strength adjusted to be ~30% of the maximum. LTP was calculated 
as mean percentage change in the amplitude of the population spike following 
high-frequency stimulation relative to its basal amplitude. 

Contextual fear conditioning. The paradigm was done following previously 
published techniques”’. In this task, mice learned to associate the environmental 
context (fear-conditioning chamber) with an aversive stimulus (mild foot shock; 
unconditioned stimulus (US)), enabling testing for hippocampal-dependent con- 
textual fear conditioning. As contextual fear conditioning is hippocampus and 
amygdala dependent, the mild foot shock was paired with a light and tone cue 
(conditioned stimulus (CS)) in order to also assess amygdala-dependent cued fear 
conditioning. Conditioned fear was displayed as freezing behaviour. Specific train- 
ing parameters are as follows: tone duration is 30s; level is 70 dB, 2 kHz; shock 
duration is 2 s; intensity is 0.6 mA. This intensity is not painful and can easily be 
tolerated but will generate an unpleasant feeling. More specifically, on day 1 each 
mouse was placed in a fear-conditioning chamber and allowed to explore for 2 min 
before delivery of a 30s tone (70 dB) ending with a 2s foot shock (0.6 mA). Two 
minutes later, a second CS-US pair was delivered. On day 2 each mouse was first 
place in the fear-conditioning chamber containing the same exact context, but 
with no adminstration of a CS or foot shock. Freezing was analysed for 1-3 min. 
One hour later, the mice were placed in a new context containing a different odour, 
cleaning solution, floor texture, chamber walls and shape. Animals were allowed to 
explore for 2 min before being re-exposed to the CS. Freezing was analysed for 
1-3 min. Freezing was measured using a FreezeScan video tracking system and 
software (Cleversys). 

RAWM. Spatial learning and memory was assessed using the RAWM paradigm 
following the exact protocol described previously”*. The goal arm location contain- 
ing a platform remains constant throughout the training and testing phase, 
whereas the start arm is changed during each trial. On day 1 during the training 
phase, mice are trained for 15 trials, with trials alternating between a visible and 
hidden platform. On day 2 during the testing phase, mice are tested for 15 trials 
with a hidden platform. Entry into an incorrect arm is scored as an error, and 
errors are averaged over training blocks (three consecutive trials). 

Cranial irradiation. Adult mice (8-12 weeks) were sham irradiated (controls) or 
irradiated at 5 Gy three times over 8 days using the Mark I gamma irradiator and 
killed at 8-10 weeks after irradiation to collect brains for immunohistochemical 
analyses. Each mouse was placed in a restrainer that was fitted into a slot in the lead 
brick shield so that the back of the skull was facing the source of radiation when 
positioned in the radiation chamber. The shield is constructed of lead bricks such 
that only the hippocampal/midbrain area was exposed to radiation. Calibration for 
5 Gy radiation was done using nanoDot. Shielded areas were protected with an 
exposure rate ten times lower than the exposed area. RAWM studies were done on 
irradiated mice at least 6 weeks after the radiation procedure. This time frame 
ensured adequate recovery of the animals. All data were from 8 irradiated and 10 
sham-irradiated mice. 

Plasma collection and proteomic analysis. Mouse blood was collected from 400- 
500 young (2-3 months) and old (18-22 months) animals into EDTA-coated tubes 
via tail vein bleed, mandibular vein bleed, or intracardial bleed at time of death. 
EDTA plasma was generated by centrifugation of freshly collected blood and 
aliquots were stored at —80°C until use. Human plasma and CSF samples were 
obtained from academic centres and subjects were chosen based on standardized 
inclusion and exclusion criteria as previously described”*°. The relative plasma 
concentrations of cytokines and signalling molecules were measured in human and 
mouse plasma samples using standard antibody-based multiplex immunoassays 
(Luminex) by either Rules Based Medicine, a fee-for-service provider, or by the 
Human Immune Monitoring Center at Stanford University. All Luminex measure- 
ments where obtained in a blinded fashion. All assays were developed and validated 
to Clinical Laboratory Standards Institute (formerly NCCLS) guidelines based 
upon the principles of immunoassay as described by the manufacturers. 

CCL11, MSCE, antibody, or plasma administration. Carrier-free recombinant 
murine CCL11 dissolved in PBS (10 pgkg™ '; R&D Systems), carrier-free recom- 
binant MCSF dissolved in PBS (10 jig kg‘; Biogen), rat IgG2a neutralizing antibody 
against mouse CCL11 (50pgkg '; R&D Systems, clone 42285), and isotype- 
matched control rat IgG2a recommended by the manufacturer (R&D Systems, 
clone 54447) were administered systemically via intraperitoneal injection over ten 
days on day 1, 4, 7 and 10. The same reagents (0.50 ul; 0.1 pg ul) were also 
administered stereotaxically into the dentate gyrus of the hippocampus in some 
experiments (coordinates from bregma: A = —2.0mm and L= —1.8mm; from 
brain surface: H = —2.0mm). Pooled mouse serum or plasma was collected from 
young (2-3 months) mice and old (18-20 months) mice by intracardial bleed at time 
of death. Serum was prepared from clotted blood collected without anticoagulants; 
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plasma was prepared from blood collected with EDTA followed by centrifugation. 
Aliquots were stored at —80°C until use. Prior to administration plasma was 
dialysed in PBS to remove EDTA. Young adult mice were systemically treated with 
plasma (100 il) isolated from young or aged mice via intravenous injections four 
times over 10 days. 

In vivo bioluminescence imaging. Bioluminescence was detected with the In vivo 
Imaging System (IVIS Spectrum; Caliper Life Science). Mice were injected intra- 
peritoneally with 150 mgkg  p-luciferin (Xenogen) 10 min before imaging and 
anaesthetized with isofluorane during imaging. Photons emitted from living mice 
were acquired as photons per second per cm? per steridan (sr) using 
LIVINGIMAGE software (version 3.5, Caliper) and integrated over 5 min. For 
quantification a region of interest was manually selected and kept constant for all 
experiments. 

Cell culture assays. Mouse NPCs were isolated from C57BL/6 mice as previously 
described'*. Brains from postnatal animals (1-day old) were dissected to remove 
the olfactory bulb, cerebellum and brainstem. After removing superficial blood 
vessels forebrains were finely minced, digested for 30 min at 37°C in DMEM 
media containing 2.5Uml ' Papain (Worthington Biochemicals), 1 Uml”' 
Dispase II (Boeringher Mannheim) and 250U ml! DNase I (Worthington 
Biochemicals) and mechanically dissociated. NPCs were purified using a 65% 
Percoll gradient and plated on uncoated tissue culture dishes at a density of 
10° cellscm™~ *. NPCs were cultured under standard conditions in NeuroBasal A 
medium supplemented with penicillin (100 U ml” '), streptomycin (100 mg ml‘), 
2mM L-glutamine, serum-free B27 supplement without vitamin A (Sigma- 
Aldrich), bFGF (20 ng ml ') and EGF (20 ngml > 1). Carrier-free forms of murine 
recombinant CCL11 (100 ng ml *, R&D Systems), goat IgG neutralizing antibody 
against mouse CCL11 (10g ml’; R&D Systems), and control goat IgG (10 pg 


ml '; R&D Systems) were dissolved in PBS and added to cell cultures under self- 
renewal conditions every other day following cell plating. 

Human NTERA cells’ expressing eGFP under the Dcx promoter were cultured 

under standard self-renewal and differentiation conditions*’**. Carrier-free forms 
of human recombinant CCL11 (100 ng ml |, R&D Systems), mouse IgG, neut- 
ralizing antibody against human CCL11 (25 1g ml-.; R&D Systems, clone 43911) 
and control mouse IgG; (25 pg ml’; R&D Systems) were added to cell cultures 
under differentiation conditions every other day following cell plating. 
Data and statistical analysis. Data are expressed as mean = s.e.m. Statistical ana- 
lysis was performed with Prism 5.0 software (GraphPad Software). Means between 
two groups were compared with two-tailed, unpaired Student’s t-test. Comparisons 
of means from multiple groups with each other or against one control group were 
analysed with one-way ANOVA and Tukey-Kramer’s or Dunnett’s post-hoc tests, 
respectively. Plasma protein correlations in the ageing samples were analysed with 
the Significance Analysis of Microarray software (SAM 3.00 algorithm; http:// 
www.stat.stanford.edu/~tibs/SAM/index.htm). Unsupervised cluster analysis 
was performed using Gene Cluster 3.0 software and node maps were produced 
using Java TreeView 1.0.13 software. All histology, electrophysiology and behaviour 
experiments conducted were done in a randomized and blinded fashion. 
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Cell-to-cell spread of HIV permits ongoing 
replication despite antiretroviral therapy 


Alex Sigal', Jocelyn T. Kim'”, Alejandro B. Balazs', Erez Dekel*, Avi Mayo®, Ron Milo* & David Baltimore! 


Latency and ongoing replication’ have both been proposed to 
explain the drug-insensitive human immunodeficiency virus 
(HIV) reservoir maintained during antiretroviral therapy. Here 
we explore a novel mechanism for ongoing HIV replication in 
the face of antiretroviral drugs. We propose a model whereby 
multiple infections*’ per cell lead to reduced sensitivity to drugs 
without requiring drug-resistant mutations, and experimentally 
validate the model using multiple infections per cell by cell-free 
HIV in the presence of the drug tenofovir. We then examine the 
drug sensitivity of cell-to-cell spread of HIV*’, a mode of HIV 
transmission that can lead to multiple infection events per target 
cell®°. Infections originating from cell-free virus decrease 
strongly in the presence of antiretrovirals tenofovir and efavirenz 
whereas infections involving cell-to-cell spread are markedly less 
sensitive to the drugs. The reduction in sensitivity is sufficient to 
keep multiple rounds of infection from terminating in the presence 
of drugs. We examine replication from cell-to-cell spread in the 
presence of clinical drug concentrations using a stochastic infec- 
tion model and find that replication is intermittent, without sub- 
stantial accumulation of mutations. If cell-to-cell spread has the 
same properties in vivo, it may have adverse consequences for the 
immune system’, lead to therapy failure in individuals with risk 
factors, and potentially contribute to viral persistence and hence 
be a barrier to curing HIV infection. 

Current antiretroviral therapy (ART) does not cure HIV infection 
because low-level viraemia persists from virus reservoirs that are 
insensitive to ART’. The reservoirs may be long-lived infected cells, 
cells with latent virus, ongoing cycles of infection termed ongoing 
replication, or a combination of sources’. How ongoing replication 
might take place in the face of ART has remained unclear. If ART 
succeeds in decreasing ongoing HIV replication to very low levels, why 
does it not eliminate replication completely? Here we explore a novel 
mechanism for ongoing HIV replication in the presence of ART. 

Multiple infections of one cell may propagate at drug concentrations 
where infection by single particles would die out: if more virions are 
transmitted per cell, the probability that at least one of the virions 
escapes the drug should increase (Fig. 1a). To model the effect of mul- 
tiple infections on drug sensitivity (Supplementary Theory, section 1), 
we assume infections by individual virions are independent events, each 
with a probability of escaping the drug and succeeding in infecting the 
cell. To quantify infection sensitivity to drugs, we introduce the trans- 
mission index (Tx), which we define as the fraction of cells infected in 
the presence of drug (Iq) divided by the fraction of cells infected in the 
absence of drug (J). Given: (1) a multiplicity of infection of m infectious 
units per cell, where m is defined as the product of virus particle number 
and the probability of infection per virus particle; (2) a concentration of 
antiretroviral agent d that reduces m by factor f(d), where f(d) = 1. 
Under these conditions, the transmission index is: 

Iq l-e m/f 


Ty = — = —_——_ ] 
aT l—e—™ (1) 


Tx has two important limiting regimes: m<J1, in which case 
Tx = 1/f(d) and m/f(d) > 1, in which case Ty ~ 1. In the first case, 
where few viruses infect each cell, the infection is sensitive to the effect 
of the drug, whereas in the second, where many viruses infect each cell, 
the infection is insensitive. 

To test this, we infected the highly infection-permissive MT-4 T-cell 
line with cell-free HIV encoding yellow fluorescence protein (YFP)"* at 
low (0.2) and high (100) m in the presence of tenofovir (TFV), a 
nucleotide reverse transcriptase inhibitor. We determined infected cell 
number by YFP fluorescence (Supplementary Fig. 1) and observed that 
infection with cell-free virus at low m was sensitive to TFV across the 
range of concentrations used. At high m, infection was insensitive to 
low and intermediate TFV concentrations (Fig. 1b), supporting the 
model. Thus, multiple cell-free HIV infections per cell recapitulate the 
insensitivity to drug of an HIV reservoir. 

Multiple infections occur in vivo”'® and in culture*!® and are 
thought to be associated with cell-to-cell spread”*”°, a directed trans- 
mission mode that minimizes the number of virus particles failing to 
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Figure 1 | Multiple infections per cell decrease sensitivity to drug. 

a, Hypothesis. Red circles indicate infected cells, arrows indicate transmissions, 
hexagons or hexagons surrounded by circles indicate viruses, broken circles 
indicate degraded viruses, crosses indicate viruses blocked by drug and wavelets 
indicate successful infection. b, MT-4 cells were pre-incubated with TFV and 
infected with HIV coding for YFP. Infection multiplicity m was 0.2 (blue 
squares) or 100 (red squares). Lines are a guide for the eye. Mean + standard 
deviation (s.d.) of replicates (n = 3). Circles represent calculated values of Ty at 
m = 100 according to equation (1) with f(d) at each drug concentration 
determined empirically at m = 0.2. 
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reach the target cell. We therefore used co-culture with infected cells to 
generate cell-to-cell spread and compared drug sensitivity to infection 
with cell-free virus. Infection by co-culture occurs both by cell-free 
virus shed by infected donor cells and by cell-to-cell spread. 
Administration of cell-free virus lacks a cell-to-cell component—the 
measured average virus cycle time (1.7 days; Supplementary Fig. 2) 
would rarely permit cell-free virus infected cells to complete a second 
round of infection during the experiment (2 days). Therefore, we 
compared cell-free virus infection and the combination of cell-free 
virus infection and cell-to-cell spread resulting from co-culture. We 
used drugs that act far downstream of entry, to ensure any differences 
between cell-to-cell and cell-free infection are not due to factors that 
physically inhibit drug action in cell-to-cell spread. 

We infected peripheral blood mononuclear cells (PBMCs) in the 
presence or absence of TFV by co-culture or using cell-free virus. To 
separate donor from target cells in co-culture, we used HLA-A2-negative 
donor cells and HLA-A2-positive targets (Supplementary Fig. 3a). Two 
days post-infection, we determined the fraction of target cells infected 
using p24 intracellular staining of HLA-A2-positive PBMCs (Fig. 2a, top 
panel, controls in Supplementary Fig. 3b). Co-culture dramatically 
decreased sensitivity to drug: TFV decreased cell-free infection ~30-fold 
but caused less than a twofold decrease of co-culture infection (Fig. 2b). 
The decline in HLA-A2 expression in the target cells after infection 
(Supplementary Fig. 3b) is consistent with observations that productive 
HIV infection downregulates HLA’. 

We also used Rev-CEM”* reporter T cells as targets. These cells 
express green fluorescent protein (GFP) in the presence of HIV early 
proteins Tat and Rev (Supplementary Fig. 4). To infect Rev-CEM cells, 
we used either cell-free HIV or co-culture with infected MT-4 cells 
engineered to be >99% mCherry positive (Supplementary Fig. 5). We 
excluded GFP/mCherry double-positive cells from the analysis to 


avoid scoring fused cells as infected (Supplementary Figs 6 and 7). 
This underestimates co-culture infection because it excludes unfused 
cell doublets in the process of virus exchange. 

To block infection, we applied TFV and the non-nucleoside reverse 
transcriptase inhibitor efavirenz (EFV) (Fig. 2a, bottom panel, 
Supplementary Fig. 7). At the highest concentrations used, co-culture 
Tx was over sixfold higher than cell-free infection Tx (Fig. 2b). The 
trend was similar when donors were PBMCs or Rev-CEM cells 
(Supplementary Fig. 8). Co-culture Tx was lower than in PBMC-to- 
PBMC transmission, suggesting that target cells have an important 
role in cell-to-cell spread efficiency. The lower drug sensitivity in co- 
culture was not due to secreted donor cell factors that decrease the 
susceptibility of target cells to drugs (Supplementary Fig. 9). 

We next determined the number of infectious units (m) transmitted. 
For co-culture, m was previously proposed to have a two-peaked 
Poisson distribution, one peak corresponding to cell-free virus or some 
low virus cell-to-cell transmissions, and the second to high virus num- 
ber transmissions*”. We fit a two-peaked Poisson and other distribu- 
tions to the data (Supplementary Theory, section 2). The two-peaked 
Poisson fit the data best (Fig. 2b, dotted line, Supplementary Fig. 10). 
The first peak mean was ~1 infectious unit for both drugs, with 94% 
and 97% of infections in this peak for TFV and EFV, respectively. The 
second peak mean was 73 (TFV) and 175 (EFV), with the remaining 6% 
and 3% of infections in this peak. This predicts that whereas most 
infections are cell-free or low virus cell-to-cell transmissions, a minority 
involve very large numbers of viruses. This might seem to imply large 
numbers of integrations in the absence of drug in the high virus number 
subset. Arguing against this is our observation of a significantly 
increased cell death rate with increasing numbers of multiple infections 
in the absence of drugs (data not shown). Inter-virus interference, such 
as downregulation of CD4 receptors’’, may also limit provirus number. 
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Figure 2 | Cell-to-cell spread reduces sensitivity to drugs. a, Top, infection of 
HLA-A2-positive PBMC targets with cell-free virus (left two plots) or infected 
HLA-A2-negative PBMC donors (right two plots) in the absence or presence of 
10 uM TEV. x-axis is p24, y-axis HLA-A2 status. Bottom, the number of GFP- 
positive Rev-CEM cells after infection with cell-free virus (left two plots) or 
infected MT-4mCherry donors (right two plots) in the absence or presence of 
60 uM TFV. x-axis is GFP, y-axis is mCherry fluorescence. b, Transmission 
index when infection source was cell-free HIV (blue bars or squares) or co- 
culture with HIV-infected donor cells (red bars or squares). Mean = s.d. 
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(n = 3). Top graph is PBMCs with TFV, middle graph is Rev-CEM cells with 
TEV, bottom graph is Rev-CEM cells with EFV. Black dashed line is best fit of m 
with a two-peaked Poisson distribution described by 

PUM; a,{4,) =(1— aye" yu" /m! + ae~ 3" /m!, where py and fy are the 
means of the first and second peak respectively, and a is the fraction of 
transmissions that fall within the second peak. For TFV, fy) = 1.1, fy = 73, 

a = 0.06. For EFV, ft; = 0.8, fla = 175, with a = 0.03. Root mean squared error 
was 0.01 (EFV) to 0.02 (TFV). 
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To investigate whether cell-to-cell spread can lead to HIV replica- 
tion through multiple virus cycles with ART, we measured the replica- 
tion ratio (R), defined as fold change in the number of infected cells 
per virus cycle under conditions where target cells are not limiting: 
(Ip/Ip)'/*. Here k is the number of elapsed virus cycles, I, is the number 
of infected cells at virus cycle k, and Ip is the number of infected cells at 
the start. For expanding infections R> 1, whereas infections with 
R<1 ultimately terminate’. Although this assumes synchronized 
virus cycles, we simulated desynchronization and observed that its 
effect was negligible at the measured variability in cycle lengths 
(Supplementary Fig. 11). 

To measure R, we tracked infection daily (Methods) in the absence 
of drug, with 100 [1M TFV, or with a combination of EFV, TFV and the 
nucleoside reverse transcriptase inhibitor emtricitabine (FTC) at their 
clinical maximum plasma concentrations (Cyax: 10 UM EFV, 2 uM 
TFV and 104M FTC”). The fraction of infected cells was kept low 
to ensure that target cells were not limiting. Ro, Rrpy and Rc,,,,, the 
replication ratios with no drug, TFV or at C,,a,, were fitted from the 
data (Fig. 3a, dashed lines). They were 65, 2.5 and 0.95, respectively. 
Rrry was significantly greater than 1 (P < 0.01), indicating an expand- 
ing infection. Rc,,, was slightly lower than 1 in all experiments 
(Supplementary Fig. 12), indicating an infection slightly below the 
expansion threshold. 

We compared experimentally obtained replication ratios with those 
predicted for the same drug concentrations if cell-free infection were 
the only infection route (Supplementary Theory section 3 and Sup- 
plementary Fig. 13). We obtained Rypy = 1.1 and Rc,,, =0.60 values 
in this case (Fig. 3a). The predicted R with no replication, resulting 
solely from infected cell half-life, was 0.46 (Fig. 3a). Predicted cell-free 
replication ratios were significantly lower (P < 0.02 for TFV, P< 0.01 
for Crax) than ratios experimentally obtained from co-culture. 

Given the lack of evolution in the plasma in individuals with HIV 
successfully suppressed by drugs”, ongoing replication can occur if: 
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Figure 3 | Co-culture infection dynamics. a, Infection growth rate. Drug 
conditions were: no drug (black squares), 100 1M TFV (red squares) and Cyax 
(blue squares). Means + s.d. of inter-day experiments (n = 3). Dashed lines 
represent fits of I, = IpR* for each drug condition. Solid lines are predicted 
infection dynamics for infection occurring exclusively by cell-free virus in the 
presence of 100 uM TFV (red line), Cnax (blue line), or with no viral replication 
(green line). b, Simulation of the number of infected cells and mutations per cell 
with an input of one infected cell per virus cycle (Methods). x-axis is time, y-axis 
is number of total infected cells (top graph), newly infected cells (middle graph) 
or sum of mutations divided by the sum of total infected cells (bottom graph). 
Average number of mutations per cell over time is 0.6 + 0.5 (mean + s.d., 
n= 215). 
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(1) it is compartmentalized to other locations”, (2) if it is inter- 
mittent; (3) the circulating virus is at a fitness maximum’; or some 
combination of these factors. We obtained Rc, =0.95. If this is 
extrapolated in vivo, it follows that ongoing replication cannot persist 
independently but may have a role if it interacts with another reservoir 
that primes replication’’. To examine this scenario, we performed a 
stochastic simulation (Methods). As expected for intermittent replica- 
tion, every infection chain that starts from the introduction of an 
infected cell from a different reservoir—for example, reactivation from 
latency—terminates (Supplementary Fig. 14). A constant input of one 
infected cell per virus cycle results in a steady state where substantial 
numbers of newly infected cells are generated, but the average number 
of mutations anywhere on the HIV genome per infected cell is low 
(~1, Fig. 3b). Because each infection chain is independent, these muta- 
tions are expected to be sporadic and not linked by temporal structure. 

Evidence for ongoing replication during ART derives from the 
decrease in virus decline rates**, some HIV sequence divergence” 
and long terminal repeat circle formation when the integrase inhibitor 
raltegravir is included in drug regimens"’. At least in some individuals, 
antiretroviral suppression is close to the ongoing replication threshold: 
a mutation conferring very low-level resistance to EFV at therapy 
initiation” is sufficient to cause ongoing replication, as indicated by 
increased virological failure risk'*. Our data indicate that cell-to-cell 
spread is a likely source of intermittent ongoing replication in the face 
of ART, and that this is a consequence of some cell-to-cell infections 
transmitting virus numbers much in excess of what is required to infect 
a cell in the absence of ART. The large transmitted dose strongly 
decreases the probability that every transmitted virus will be inhibited 
by the drugs, and therefore greatly weakens their effect. This replica- 
tion may adversely affect the immune system, increasing activation’ 
and cell death'*, and could potentially contribute to the maintenance of 
an HIV reservoir in locations such as lymphoid tissue where cell-to- 
cell spread occurs. 


METHODS SUMMARY 

HIV infection at high and low m. NL4-3YFP HIV stock at a 1:2,000 or 1:4 final 
dilution was added to MT-4 cells pre-incubated with TFV. Two days post- 
infection, the number of YFP-positive cells was determined by FACS. The mul- 
tiplicity of infection was calculated as m In p(0) In(1—J}.2,000), where 
T2000 is the fraction of YFP-positive cells at the 1:2,000 dilution. 

Drug sensitivity of co-culture versus cell-free infections. Donor cells were 
infected with NL4-3 strain HIV and incubated for two to three days. Infected 
donor cells or cell-free NL4-3 were then added to target cells. Two days after target 
cell infection, the number of infected cells was determined by FACS using intra- 
cellular p24 staining (PBMCs) or GFP expression (Rev-CEM cells). In all experi- 
ments, uninfected PBMC or MT-4mCherry cells were added to cell-free virus 
infections to keep total cell numbers equal on day 0. 

Infection growth rate. Infection was initiated by adding infected Rev-CEM cells 
to uninfected Rev-CEM cells pre-incubated with drugs. Cells were passaged on 
each day following infected cell addition: infection with no drug was split 1:10 into 
fresh Rev-CEM cells. For 100 1M TEV or Cyyax, infected cells were split 0.6:1 with 
drug-containing medium. Cells remaining after split were used to quantify the 
fraction of infected cells by FACS. The fold change in infected cells on each day was 
calculated as N;.D;/No, where N; is the fraction of infected cells on day k, D, is total 
dilution factor (split) up to day k and No is the fraction of infected cells on day 1. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Cells, viruses and drugs. The following were obtained through the AIDS Research 
and Reference Reagent Program, National Institute of Allergy and Infectious 
Diseases, National Institutes of Health: Rev-CEM cells from Y. Wu and J. Marsh; 
MT-4 cells from D. Richman; HIV expression plasmid pNL4-3 from M. Martin; 
TFV; EFV. The NL4-3YFP molecular clone was a gift from D. Levy. Cell-free virus 
was produced by transfection of HEK293 cells with virus coding plasmid using 
Fugene 6 or Fugene HD (Roche). Supernatant containing shed virus was harvested 
after two days of incubation. Number of virus genomes of viral stock was deter- 
mined using the RealTime HIV-1 Viral Load test (Abbott Diagnostics, Abbott Park 
Ill) and gag p24 content was determined by ELISA (Perkin-Elmer) at the the ARI- 
UCSF Laboratory of Clinical Virology. The MT-4mCherry cell line was created by 
infecting MT-4 cells with a pHAGE2 lentiviral vector expressing mCherry under 
the control of the EFlx promoter. To obtain a 0.99 fraction of mCherry-positive 
cells, MT-4mCherry cells were used fresh after lentiviral infection without a cycle of 
freezing and thawing, minimizing the number of population doublings and con- 
sequent decrease in the mCherry-positive fraction. Anonymous PBMCs or peri- 
pheral blood samples were provided by AllCells (PBMCs) or the UCLA Center for 
AIDS Research (CFAR) Virology Core Lab (peripheral blood). For whole blood, 
PBMCs were purified by Ficoll gradient using standard techniques. Purified PBMCs 
were activated with 5 j1gml~' PHA in the presence of 5 ng ml”! IL-2 for 1 (donors) 
or 3 (targets) days. All work was approved by the California Institute of Technology 
Institutional Biosafety Committee and Institutional Review Board exempt. 
NL4-3YFP infection at high and low m. MT-4 cells were pre-incubated for 24h 
with varying concentrations of TFV. NL4-3YFP stock was produced using trans- 
fection of HEK293 cells at 80% confluence with Fugene HD. Virus supernatant 
was collected 2 days post-transfection and added fresh to maximize the number of 
infectious units. Fresh virus stock was used at a 1:2,000 (low m) or 1:4 final dilution 
(high m). After 2 days incubation with virus, the number of YFP-positive MT-4 
cells was quantified by flow cytometry by collecting 2X 10° cells using a 
FACScaliber machine (Becton Dickenson). The multiplicity of infection was cal- 
culated using Poisson statistics: m In p(0) In(1—Th.2,000), where p(0) is 
the fraction of YFP-negative cells, and I).2,900 is the fraction of YFP-positive cells at 
the 1:2,000 dilution. 

Comparison of co-culture and cell-free infections in PBMCs. For PBMC infec- 
tions, 1.5 X 10° PHA-activated HLA-A2-negative donor PBMCs at 10° cells ml’ 
were either infected with 700 ng HIV (NL4-3 strain), or mock infected with the 
same volume of growth medium. Cells were then incubated for 2 days. Two days 
after donor-cell infection, PHA-activated HLA-A2-positive PBMC target cells at 
10° cells ml ' were either treated with no drug or 10 tM TEV. The stock of target 
cells with or without drug was then split into wells at 10° cells well” * and incubated 
for 4h. After target cell incubation, HLA-A2-negative donor PBMCs were washed, 
counted, diluted to 10° cells ml” ' and added to target cells at an approximately 
1:10 donor:target ratio as follows. For cell-free infection, each well received 100 il 
mock-infected HLA-A2-negative donor PBMCs and 150,11 (250 ng) cell-free 
NL4-3. For co-culture infection, each well received 100 pl infected HLA-A2- 
negative donor PBMCs and 150 ul growth medium. One day after target-cell 
infection, cell aggregates were broken up by repeated pipetting, and cells split 
1:2 with fresh growth medium containing the corresponding drug concentration. 
Two days after target-cell infection, the number of infected target cells was deter- 
mined: cells were stained with PE-conjugated anti-HLA-A2 antibody (BD 
Biosciences or Biolegend), fixed and permeabilized (Cytofix/cytoperm kit, BD 
Biosciences), then stained with intracellular FITC-conjugated anti-HIV p24 
antibody (clone KC57, Coulter Corporation) according to the Cytofix/cytoperm 
kit protocol. The fraction of infected target cells was quantified by FACS as HLA- 
A2, p24 double-positive cells. We observed that PBMCs were infected best when 
fresh, and use of previously frozen material or cells whose processing was delayed 
substantially reduced both cell-free and co-culture infections. 

Comparison of co-culture and cell-free infections using Rev-CEM cells. We 
infected Rev-CEM target cells either by co-culture with MT-4mCherry donor cells 
or cell-free virus. MT-4mCherry donor cells at 4 X 10° cells ml”! were infected 
with 300ng ml! p24 NL4-3, or mock infected with the same volume of growth 
medium. Donor cells were then incubated for 3 days. Two days after donor-cell 
infection and one day before target-cell infection, Rev-CEM target cells at 8 X 10° 
cells ml”! were treated with no drug, TFV, or EFV. The stock of target cells with or 
without drug was then split into wells at 1.6 X 10° cells well” ' and incubated for 
24h. Three days after donor-cell infection, MT-4mCherry donor cells were 
washed, counted, diluted to 3 X 10° cellsml~' and added at an approximately 
1:100 donor:target ratio as follows. For cell-free infection, each well received 
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100 ul mock-infected MT-4mCherry donor cells and 600 ul (141g) cell-free 
NL4-3. For co-culture infection, each well received 1001 infected MT- 
4mCherry donor cells and 600 pil growth medium. One day after target-cell infec- 
tion, cell aggregates were broken up by repeated pipetting, and cells split 1:2 with 
fresh growth medium containing the corresponding drug concentration. Two days 
after target-cell infection, the number of infected target cells were quantified by 
FACS by mCherry and GFP fluorescence. Infected target cells were gated as 
positive for GFP, and negative for mCherry, thereby excluding uninfected Rev- 
CEM cells (GFP negative), MT-4mCherry cells (GFP negative, mCherry positive), 
and fusions between MT-4mCherry and Rev-CEM cells (GFP positive, mCherry 
positive). The fraction of MT-4mCherry donors was 1% on day 0 for both mock- 
infected and infected cells, but decreased for infected MT-4mCherry cells by the 
end of the target-cell infection, probably owing to the cytotoxicity of infection. To 
ensure that the low numbers of infected target cells gave repeatable results, we 
averaged consecutive independent inter-day experiments. 

Infection growth rate. To initiate infection, Rev-CEM cells at 4 x 10° cells ml”! 
were infected with 300ngml_' NL4-3 in the absence of drugs and incubated for 
three days. Two days post-infection, uninfected Rev-CEM cells at 8 X 10° cells 
ml! were pre-treated with no drug, 100 1M TFV, or a combination of EFV, TFV 
and FTC at their clinical maximum plasma concentrations (Cyya,: 10 WM EFV, 
2M TEV and 10M FTC). Three days after the initial infection with cell-free 
virus, the infected Rev-CEM cells were washed and added to a final fraction of 
0.2% GFP-expressing donor cells to the uninfected Rev-CEM cells incubated with 
no drug, TFV or Cyyax- On each day after infected donor addition, cell aggregates 
were broken up by gentle repeated pipetting and cells split. Infection conditions 
were calibrated so that the number of uninfected target cells would not be limiting 
and infection would not interfere with proliferation of uninfected cells. Infection 
was therefore kept below ~0.5% GEFP-expressing infected Rev-CEM cells. The 
daily cell dilution was calibrated to keep this steady state of infected cells: the 
sample with no drug was split 1:10 or 1:20 into fresh Rev-CEM cells in a new well. 
For 100 uM TFV or Cy,ax drug concentrations, infected cells were split 0.6:1 with 
drug-containing medium into a new well. Cells remaining after cell split were used 
to quantify the fraction of infected cells by FACS (5 X 10° collected per sample). 
The fold change in infected cells on each day was calculated as N,D,/No, where N; 
is the fraction of infected cells on day k, Dy is the total dilution factor up to day k 
and Np is the fraction of infected cells on day 1. The drug effect ona single round of 
cell-free infection for 100 LM TFV or Cyyax Was measured at the same time as the 
infection growth rate to prevent differences in drug stock batch or cells. 
Stochastic simulation of the number of infected cells and mutations. The 
purpose of the simulation was to determine the sum of total infected cells, newly 
infected cells, and mutations at each virus cycle (measured as 1.7 days (Sup- 
plementary Fig. 2)) from overlapping infection chains. A new infection chain 
was initiated each virus cycle with an input of one infected cell. The number of 
infected cells in cycle k+1 generated by infected cell j in cycle k was an integer 
Tg] =X, + X2, where x, was a random number from a Poisson distribution with 
an average 1, defined by the measured infected-cell half-life 4, = 27 v2 = 0.46 
(Supplementary Fig. 13 and Supplementary Theory, section 3), and x. was a 
random number from a Poisson distribution with an average {2 defined by 
Hy =Ro,, — Ly = 0.49 (Supplementary Theory, section 3). Given an outcome of 
N infected cells in cycle k,. the number of total infected cells in virus cycle k+1 in 


the infection chain was > (x/ +), of which the number of newly infected cells 
was » x. Anew infection chain from an input of one infected cell was generated 


every vine cycle. Therefore, infection chains overlapped, and the total output 
number of infected cells in virus cycle k+ 1 was a sum of infected cells at that virus 
M N 


cycle from - M gptection chains: > 2 (x) +x >). The number of newly infected 
cells was = — If a new iatsetion occurred, the probability of mutation 


occurring at any one of the 10* nucleotides of the HIV genome was 
1—(1—3.4x 1075)” =0.29, where 3.4 X 10~ ° is the per-base probability of muta- 
tion for the HIV reverse transcriptase, and (1 —3.4 x 107° )' is the probability that 
no mutations occur during a single reverse transcription event. As a simplifying 
assumption, no fitness benefit or cost was assigned to individual mutations. 
Therefore, Rc,,,. did not change during the course of the simulation. Surviving cells 
carried over their mutations to the next generation, and newly infected cells carried 
over mutations from the infected donor cells, in addition to any mutations generated 
during the infection process. 
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Intravenous delivery of a multi-mechanistic 
cancer-targeted oncolytic poxvirus in humans 


Caroline J. Breitbach', James Burke?*, Derek Jonker**, Joe Stephenson’, Andrew R. Haas°, Laura Q. M. Chow?*, J orge Nieva?, 
Tae-Ho Hwang’, Anne Moon!, Richard Patt®, Adina Pelusio', Fabrice Le Boeuf’, Joe Burns**, Laura Evgin*4, Naomi De Silva**, 
Sara Cvancic*, Terri Robertson’, Ji-Eun Je’, Yeon-Sook Lee’, Kelley Parato®, Jean-Simon Diallo*, Aaron Fenster’, 


Manijeh Daneshmand*", John C. Bell*** & David H. Kirn'* 


The efficacy and safety of biological molecules in cancer therapy, 
such as peptides and small interfering RNAs (siRNAs), could be 
markedly increased if high concentrations could be achieved and 
amplified selectively in tumour tissues versus normal tissues after 
intravenous administration. This has not been achievable so far in 
humans. We hypothesized that a poxvirus, which evolved for 
blood-borne systemic spread in mammals, could be engineered 
for cancer-selective replication and used as a vehicle for the intra- 
venous delivery and expression of transgenes in tumours. JX-594 is 
an oncolytic poxvirus engineered for replication, transgene 
expression and amplification in cancer cells harbouring activation 
of the epidermal growth factor receptor (EGFR)/Ras pathway, fol- 
lowed by cell lysis and anticancer immunity’. Here we show in a 
clinical trial that JX-594 selectively infects, replicates and expresses 
transgene products in cancer tissue after intravenous infusion, in a 
dose-related fashion. Normal tissues were not affected clinically. 
This platform technology opens up the possibility of multifunc- 
tional products that selectively express high concentrations of 
several complementary therapeutic and imaging molecules in 
metastatic solid tumours in humans. 

Despite recent advances in cancer treatment, truly innovative 
approaches are required to move beyond the modest benefits achieved 
to date. One novel strategy is the use of replication-competent oncolytic 
viruses that selectively infect tumours**. Vaccinia and other poxviruses 
have several biological properties that make them ideally suited for intra- 
venous delivery and subsequent amplification of transgenes within 
tumours'. First, vaccinia has evolved mechanisms for intravenous 
stability and spread to distant tissues, including resistance to antibody- 
and complement-mediated neutralization in the blood**. Second, vac- 
cinia has also evolved for rapid and motile spread within tissues®”’. 
Because of their relatively large size, vaccinia virions may preferentially 
deposit in tumours, where neovasculature has increased permeability. 
Finally, the replication of vaccinia virus is dependent on EGFR/Ras 
pathway signalling*’ which is commonly activated in epithelial cancers’”. 

JX-594 is a Wyeth strain vaccinia-vaccine-derived oncolytic virus 
engineered for viral thymidine kinase gene inactivation, and expres- 
sion of transgenes encoding human granulocyte-macrophage colony 
stimulating factor (GM-CSF, encoded by CSF2) and f-galactosidase 
(B-gal, encoded by lacZ) under control of the synthetic early-late and 
p7.5 promoters, respectively'!”*. Selective replication of the virus in 
cancer cells is driven by cellular EGFR/Ras pathway signalling, thymi- 
dine kinase elevation and type-1 interferon resistance’””’. In a Phase 1 
trial of intratumoural injection into liver tumours, JX-594 was well- 
tolerated and associated with replication, expression of biologically 
active hGM-CSF and tumour destruction”. 


The clinical trial described herein was designed to test whether JX- 
594 could infect metastatic tumours after intravenous infusion in 
patients. First, we assessed JX-594 selectivity for tumour tissue after 
ex vivo infection of paired explants of viable tumour and adjacent 
normal tissue obtained from patients undergoing surgery. Within 
24h of exposure, JX-594 was able to infect tumour tissue selectively 
in seven of ten samples: most tumours had high-intensity staining 
whereas normal tissues did not (Fig. 1). Peripheral blood mononuclear 
cells were also highly resistant to infection (data not shown). 

We subsequently performed a Phase 1 dose-escalation trial of a 
single intravenous infusion of JX-594 in 23 patients with advanced, 
treatment-refractory solid tumours (Table 1). Patients were treated in 
one of six dose cohorts (1 X 10° to 3X 10’ plaque-forming units 
(p.f.u.) kg" '). JX-594 delivery, gene expression and replication in solid 
tumours were assessed. Safety (including determination of the max- 
imum tolerated dose (MTD) or maximum feasible dose (MFD)), phar- 
macokinetics and antitumour activity were also evaluated. 

For pharmacokinetic analyses, quantitative PCR (qPCR) was used to 
measure genome concentrations in blood during the 1 h of intravenous 
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Figure 1 | Ex vivo infection of explants of tumour and normal tissue from 
patients reveals tumour-selective JX-594 gene expression. JX-594 expressing 
green fluorescent protein (GFP) (JX-594-GFP*/ B-gal”) was used to infect 
primary live-tissue specimens from cancer patients undergoing surgical 
resection. Matched tumour and adjacent normal tissues were infected 
overnight to assess the selectivity of transgene expression and replication, or 
were treated with PBS as a negative control. GFP expression from JX-594- 
GFP */f-gal” -infected cells was assessed using a fluorescence microscope. 

N = normal tissue, C = cancer tissue. 
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Table 1 | Overview of patient characteristics, JX-594 delivery to tumours and antitumour activity 


Patient characteristics 


JX-594 delivery, replication Anti-tumour activity 


Pt Tumour type Dose Age Gender Previouslines Metastatic SumLD BLNAb_ Anti-B-gal Biopsy (day Biopsy (day DC* DC Mod. 
(p.f.u. kg”) (yrs) of therapy tumour sites (cm) (day 29) 8-10) IHC 8-10) PCR (RECIST) Duration Choi 
O01 Lung (NSCLC) 1 x 10° 64 F 2 6 16.2 EG EG EG NEG No (PD) A NR 
02 Colorectal 1 x 10° 72 5 3 14.7 POS EG EG NEG No (PD) A NR 
03 Lung (NSCLC) 1 x 10° 59 F 5 4 3a EG EG EG NEG Yes (SD) >10 weeks NR 
04 Colorectal 1 x 10° 80 F 0 6 22.2 EG EG EG NEG No (PD) A NR 
05 elanoma 1x 10° 76 1 9 26.5 EG EG EG NEG No (PD) A ND 
06 Thyroid 1x 10° 66 2 1 3.2 POS EG EG NEG Yes(SD) >4weeks NR 
O07. Lung(NSCLC) 3x10 67 F 4 6 10 POS EG EG NEG Yes (SD) >4weeks NR 
08 Pancreatic 3 x 10° 75 F 3 4 7.1 EG POS EG NEG No (PD) A R 
09 elanoma 3 x 10° 52 3 5 13 POS EG EG NEG Yes(SD) 4 weeks R 
) Colorectal 3x 10° 64 2 1 5.4 EG EG EG NEG Yes (SD) >10 weeks R 
1 Ovarian 1x10’ 60 F 5. 4 44 EG EG EG NEG Yes(SD) >4weeks NR 
2 elanoma 1 x10’ 60 4 3 37.4 EG ND EG NEG ND D D 
3 Lung (NSCLC) 1 x 107 67 F 3 6 3.4 POS EG EG NEG Yes (SD) 10 weeks R 
5 Gastric 1 x10’ 61 F 5 5 11 EG POS EG NEG No (PD) A R 
4 elanoma 3x10’ 79 F 3 6 5.1 EG POS EG NEG Yes (SD) >4 weeks D 
6 Leiomyosarcoma 3x10’ 55 F 3 6 i7 EG POS POS NEG Yes (SD) >16 weeks R 
7 Lung (NSCLC) 3x10’ 43 M 3 4 3:5 EG POS POS POS No (PD) A D 
8 Ovarian 15x10’ 61 F 7 5 13 EG POS POS POS Yes (SD) >16 weeks D 
20 Colorectal 1.5 x 107 67 F 3 8 22 EG POS POS POS No (PD) A R 
21 Colorectal 1.5 x 107 57 F 2 y/ 78 EG POS POS POS Yes(SD) 4 weeks D 
22 Mesothelioma 1.5 x 107 68 F 2 4 12 EG POS NEG POS Yes (PR) >10 weeks D 
23 Colorectal 1.5 x 107 68 M 3 3 78 POS POS POS NEG Yes(SD) 4 weeks D 
Patient 19 was not evaluable. Anti-B-gal, antibody development to the product of the B-galactosidase marker transgene between baseline and day 29; BL, baseline; DC, disease control; LD, longest diameter of 


tumour; Mod. Choi, modified Choi response; NA, not applicable; NAb, neutralizing antibody to JX-594 (due to previous vaccinia vaccination); ND, not determined; NEG, negative; NR, no response; NSCLC, non- 
small-cell lung cancer; PD, progressive disease; POS, positive; PR, partial response; Pt, Patient number; R, response; SD, stable disease. 
* Stable disease or partial response by RECIST criteria at week 4 and/or 10. 


+ Approximate dosage based on 66.7 kg weight (patients were treated at 1 x 10° p-f.u. (fixed dose)). 


{Partial response determined according to modified RECIST for mesothelioma?’. 
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infusion and 4h afterwards. The highest concentrations were detected 
during infusion. At doses =10’ p.fu.kg ' (£5 X 10° p.f.u. dose‘), 
genomes were still detectable in blood at 4h (Fig. 2a). Peak concentra- 
tions and area-under-the-curve were both dose-related. JX-594 infec- 
tious units were also detected during intravenous infusion and at 4h in 
high-dose patients (data not shown). 

JX-594 was generally well tolerated. Dose escalation proceeded 
without dose-limiting toxicities. Therefore, more than 3 < 10’ p-fiu. 
kg” ' (approximately 2 X 10” p.f.u. dose!) was the MFD; a fixed dose 
of 1 X 10” p.f.u. was administered to patients in the expansion cohort 
(n=5 evaluable patients). The most common treatment-related 
adverse events were grade 1-2 flu-like symptoms lasting up to 24h: 
fever (78%), chills (56%), fatigue, headache, nausea, hypotension (22% 
each), vomiting (17%), tachycardia, hypertension, anorexia and myalgia 
(13% each). A single grade-1 skin pustule was noted in each of two 
patients one week after infusion and resolved without sequelae. Levels 
of interferon (IFN)-y, tumour necrosis factor (TNF)-o and interleukin 
(IL)-6 increased acutely in a dose-dependent manner (peak, 8 h; resolu- 
tion, day 4); levels of IL-10 increased on days 4-8. In contrast, IL-1 did 
not change significantly and IL-4 decreased transiently (data not shown). 
Neutralizing antibodies to vaccinia were detectable in six patients at 
baseline, and titres increased by day 15 in all high-dose patients. No 
correlation was demonstrated between antibody titres (baseline or 
induced) and JX-594 replication, safety or antitumour activity. 
Shedding to the environment was assessed (Supplementary Discussion). 

Cancer-selective and dose-related JX-594 delivery and replication in 
tumours were demonstrated in biopsies obtained 8-10 days after infu- 
sion (Fig. 2b and Table 1). Of patients treated at =1.5 X 10’ p.fu.kg * 


Figure 2 | JX-594 is selectively delivered to, and amplified within, tumours 
after intravenous infusion. a, Acute pharmacokinetics of JX-594 genomes 
(qPCR) after a single intravenous infusion, plotted by dose cohort. Error bars 
are s.e.m. b, Dose-dependent delivery of JX-594, as demonstrated by PCR and/ 
or immunohistochemical (IHC) analysis of tumour biopsies collected 

8-10 days after treatment. c, Dose-dependent induction of antibodies to 
B-galactosidase in patients evaluable for this endpoint. p = Spearman’s rank 
correlation coefficient. The number of patients evaluable for each group (1) is 
indicated. 


(=10° p.f.u.dose*) and evaluable for biopsy analysis, 87% showed 
JX-594 positivity by qPCR and/or immunohistochemistry, whereas 
JX-594 was not detected in biopsies collected from subjects treated 
at lower doses. Of note, delivery and replication were demonstrated 
in a patient with baseline antibodies to JX-594/vaccinia (Supplemen- 
tary Fig. 1). Infection resulted in granular cytoplasmic staining by 
immunohistochemistry, indicative of virus replication (‘virus factories’) 
(Fig. 3a, d and Supplementary Fig. 2). Diffuse infection and inter- 
spersed necrosis of malignant glandular structures were also evident. 
Adjacent and intermixed normal tissues were negative for replication 
by immunohistochemistry (Fig. 3i). In immediately adjacent normal 
squamous epithelium, low-level diffuse staining without cytopatho- 
genicity indicated uptake without replication. Staining was absent in 
negative-control biopsy samples (pretreatment tumour from the same 
patient, Fig. 3c, f; no primary antibody, Fig. 3b, e). Three-dimensional 
visualization of JX-594 infection in a tumour gland revealed diffuse 
infection and spread (Fig. 3g, h and Supplementary Movie 1). 
Immunohistochemical staining for B-galactosidase protein, the 
product of the lacZ transgene, confirmed expression in JX-594-infected 
tumour cells (Fig. 3j-l). In addition, the induction of antibodies to 
B-galactosidase was a surrogate marker for JX-594 replication and 
transgene expression ({-galactosidase is not present in the product; 
high-level expression requires replication). Induction of anti-B- 
galactosidase antibody was dose-related, occurring in 100% of high- 
dose patients (Spearman’s rank correlation coefficient p = 0.975; 
P = 0.005) (Fig. 2c). Tumour-biopsy positivity correlated strongly with 
B-galactosidase antibody induction (Table 1). Finally, expression of the 
second JX-594 transgene (GM-CSF) was assessed. In a previous clinical 
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trial, we demonstrated that high-level replication could result in detect- 
able GM-CSF in blood when other inflammatory cytokines had 
returned to baseline (days4-15)"*, despite the relatively short half- 
life of GM-CSF (< 2h)'*. Therefore, detection of GM-CSF in blood 
is indicative of high-level GM-CSF expression, and may bea specific but 
insensitive marker of transgene expression. Three patients had signifi- 
cant increases from baseline in plasma GM-CSF concentrations 
(days 4-15; all other inflammatory cytokines returned to baseline in 
< 24h). All three had evidence of B-galactosidase expression, tumour 
infection and/or antitumour activity (modified Choi and/or fluoro- 
deoxy-D-glucose positron emission tomography (FDG-PET) criteria). 
GM-CSF-protein-responsive subsets of white blood cells (neutrophils, 
eosinophils and monocytes) peaked on days4-15 (Supplementary 
Fig. 3). 

Dose-related antitumour activity was demonstrated (modified 
Choi’® or response evaluation criteria in solid tumours (RECIST)’” 
criteria), and correlated with delivery and replication of JX-594 
(Table 1 and Supplementary Fig. 4). Furthermore, new tumour out- 
growth in the time after treatment was less frequent in patients treated 
with high doses than low doses (Spearman’s rank correlation coefficient 
p = —0.872; P = 0.05), indicating suppression of microscopic tumour 
foci. Two out of five high-dose patients had antitumour activity by 
FDG-PET (>25% decrease in standardized uptake value). 

Here we report the first reproducible dose-related delivery, replica- 
tion and transgene expression from a viral vector or oncolytic virus in 
metastatic solid tumours in humans after intravenous administration. 
Engineered oncolytic poxviruses such as JX-594 can express several 
complementary therapeutic proteins’* and/or siRNAs’” in metastatic 


Figure 3 | Immunohistochemical 
staining reveals JX-594 infection 
and f-galactosidase expression in 
tumours. a, Immunohistochemistry 
for vaccinia (patient 20, 10 days after 
treatment). Scale bar, 200 Lm. 

b, Immunohistochemistry, no 
primary antibody. Scale bar, 200 um. 
c, Immunohistochemistry for 
vaccinia in pre-treatment biopsy. 
Scale bar, 50 um. d-f, As in a-c for 
patient 18, biopsy at 8 days after 
treatment. Scale bars, 50 ptm. 

g, h, Three-dimensional 
reconstruction of vaccinia (green) 
throughout tumour in patient 20. 
Scale bars 200 tum. 

i, Immunohistochemistry for 
vaccinia at low magnification 
(patient 20). Scale bar, 500 jum. Black 
arrows indicate tumour; red arrows 
indicate normal tissue. 

j, Immunohistochemistry for 
B-galactosidase (patient 20). Scale 
bar, 50 um. 

k, Immunohistochemistry for 
vaccinia. Scale bar, 50 um. 1, Negative 
control. Scale bar, 50 um. Linear 
adjustment to brightness and 
contrast was applied to j-l. 
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tumours systemically and in a cancer-selective fashion; therapeutic 
concentrations in tumour tissues should therefore be markedly higher 
than in normal tissues. Incorporation of transgenes encoding marker 
proteins can facilitate the monitoring of product replication and trans- 
gene expression through blood” or radiographic assessments”’”’. 
Repeated intravenous dosing with JX-594 is currently being assessed 
and related oncolytic poxviruses should also be tested”*** . Although 
anti-viral immunity may in theory decrease the efficiency of delivery, 
all patients on this trial had a history of vaccination with live vaccinia 
virus as children, and delivery was demonstrated in a patient with 
neutralizing antibodies present at baseline. In preclinical murine 
tumour models, intravenous vaccinia delivery and efficacy was feasible 
despite high-titre antibodies”. Repeated intravenous delivery may be 
feasible because of the unique biology of vaccinia, including its ability 
to produce ‘stealth’ particles (extracellular enveloped virus) that can 
traffic in blood in the presence of neutralizing antibodies and 
complement*”. In addition, intravenous pharmacological dosing of 
JX-594 constitutes a route and dose that may transiently saturate 
native mechanisms of viral clearance. JX-594 and related poxvirus 
constructs thus represent a novel, systemic, multi-functional cancer- 
biotherapeutic platform. 


METHODS SUMMARY 


Patients. Twenty-three patients with treatment-refractory, histologically con- 
firmed, advanced, metastatic solid tumours were enrolled and received a single 
intravenous infusion of JX-594 at one of six dose levels. This trial was registered 
with clinical trials registration number NCT00625456. 

All patients gave written informed consent according to guidelines on good 

clinical practice. Protocol and consent forms were approved by the United States 
Food and Drug Administration and Health Canada, as well as the Institutional 
Review and Infection-Control Committees at each hospital. An independent data- 
safety monitoring board reviewed the clinical safety data from each patient cohort 
before each of the four dose escalations. 
Tumour biopsy analysis. Biopsies (excisional, core-needle or fine-needle aspirate) 
were obtained from all subjects 8-10 days after treatment, and were formalin-fixed 
and paraffin-embedded. Sections were subjected to haemotoxylin and eosin stain- 
ing, immunohistochemical staining for JX-594 proteins, and PCR for JX-594 
genomes. Immunohistochemistry used anti-vaccinia polyclonal antibody 
(Quartett) and a secondary antibody kit (Vectastain, Vector Laboratories). For 
detection of B-galactosidase by immunohistochemistry, an anti-f-galactosidase 
polyclonal antibody (Abcam) was used. Negative controls were run without primary 
antibody and tumours from mice treated with JX-594 were included as positive 
controls. For PCR, DNA was extracted from 5  10-11m sections using FormaPure 
kit (Agencourt) and amplified using primers corresponding to the vaccinia E3L 
gene, TCCGTCGATGTCTACACAGG and ATGTATCCCGCGAAAAATCA, 
designed using Primer3 software”, using QuantiTect SYBR Green PCR kit 
(Qiagen). Stained sections were digitized using the Aperio Scanscope (Aperio) 
and analysed using ImageScope software. Adobe Photoshop CS software (Adobe) 
was used to apply linear adjustments to brightness and contrast across all compared 
stains where indicated. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Ex vivo infection of tumour explants. Primary cancer and normal tissue speci- 
mens were obtained from consenting patients who underwent tumour resection. 
The institutional review board of Ottawa Hospital Research Institute has approved 
all human studies. A total of ten tumour samples and accompanying adjacent 
normal tissue samples from the affected organ were assayed for JX-594-GFP*/B- 
gal sensitivity. Samples were received in cell culture medium and processed 
within 2-16 h. Samples were manually divided using a 15-mm scalpel blade into 
approximately 10-mm* pieces and placed on 12-well plates with minimum essen- 
tial medium («-MEM) containing 10% fetal bovine serum (FBS), under sterile 
techniques. Samples were inoculated with 1X 10’ p.f.u. JX-594 in 100ul of 
ot-MEM serum-free medium. JX-594 was allowed to adsorb for 2 h at 37 °C, then 
1.9 ml of «%-MEM supplemented with 10% FBS was added. Infected tissue speci- 
mens were incubated in a humidified incubator at 37 °C for 24h before imaging 
JX-594-GFP"/B-gal” -driven GFP expression. 

Patients. Between 21 July 2008 and 9 February 2010, 23 patients were enrolled. 
Patients had treatment-refractory, histologically-confirmed, advanced, metastatic 
solid tumours. At least one tumour mass had to be amenable to biopsy and/or fine- 
needle aspirate. Patients had adequate haematological function (leukocyte count 
>3,500 cells per mm*, CD4 count =200 per mm’, haemoglobin =0.1 g ml ', 
platelet count =100,000 platelets per mm’) and adequate organ function (includ- 
ing aspartate aminotransferase/alanine aminotransferase <2.5 X upper normal 
limit, bilirubin =1.5 X upper normal limit and serum chemistries within normal 
limits or Grade 1), a coagulation status of (international normalized ratio 
(INR) = upper normal limit + 10%), and Karnofsky performance score of =70. 

Exclusion criteria included known central-nervous-system malignancy, clin- 
ically significant and/or rapidly accumulating ascites, pericardial and/or pleural 
effusions, unstable cardiac disease, increased risk of vaccination complications 
(exfoliative skin conditions such as eczema or ectopic dermatitis), clinically sig- 
nificant immunodeficiency, anticancer therapy within the preceding four weeks, 
and pulse oximetry O, saturation <90% at rest. 

All patients gave written informed consent according to good clinical practice 
guidelines. Protocol and consent forms were approved by the United States Food 
and Drug Administration and Health Canada, as well as the Institutional Review 
and Infection-Control committees at each hospital. An independent data-safety 
monitoring board reviewed the clinical safety data from each patient cohort before 
each of the four dose escalations. 

Manufacturing, product characterization and release testing. Clinical trial 
material (CTM) lots (m= 2) that were used in this study were manufactured 
according to good manufacturing-practice guidelines. Virus was grown in adher- 
ent mammalian cells and purified through sucrose-gradient centrifugation or by 
tangential flow filtration. In vitro and in vivo comparability testing demonstrated 
equivalence of the two lots. Quality control tests on the final product included 
assays for sterility and endotoxin, DNA, protein, p.f.u. and genome concentration 
in the CTM; functional assays included potency and GM-CSF production. CTM 
was formulated in either phosphate-buffered saline with 10% v/v glycerol (pH 7.1), 
or 30 mM Tris with 10% (w/v) sucrose (pH 7.7). Immediately before the intraven- 
ous infusion, JX-594 was diluted in bicarbonate-buffered saline in a total infusion 
volume of 250 ml. 

Treatment. JX-594 was infused in 250ml bicarbonate-buffered saline over 
60 min. Patients in the dose-escalation portion of the trial received one of six dose 
levels (1 X 10° p.fu.kg~’, cohort 1; 1X 10°p.fu.kg~', cohort 2; 3 X 10° p.f.u. 
kg, cohort 3; 1X 10” p-fu. kg, cohort 4; 3 X 10’ p.fu. kg, cohort 5; and 
1X 10° p.fu., expansion cohort) in a group-sequential dose-escalation design 
(standard 3 X 3 design; 2-6 patients for each dose cohort) at one of four sites: 
the Billings Clinic, the Cancer Center of the Carolinas, the Ottawa Hospital 
Research Institute and the University of Pennsylvania. One additional patient 
could be treated in each dose cohort once that cohort was cleared for safety. An 
additional six patients were enrolled as part of an expansion cohort at the appro- 
ximate midpoint between the cohort-4 and cohort-5 doses, a fixed dose of 1 X 10° 
p.f.u. (approximately 1.5 X 10” p.f.u. kg”, depending on patient weight). A start- 
ing dose of 1 X 10° p.f.u.kg' was chosen on the basis of findings from a Good 
Laboratory Practice toxicology study, and demonstrated safe patient-blood con- 
centrations after intratumoural injection of JX-594 into liver-based tumours". 
Dose-limiting toxicities were defined as any of the following treatment-related 
adverse events (through the 4-week evaluation period): 1) any grade-4 toxicity 
(except isolated grade-4 lymphopenia lasting =7 days); 2) grade-3 or -4 hypoten- 
sion, disseminated intravascular coagulation or allergic reaction/hypersensitivity; 
3) grade-3 non-haematological toxicity persisting >7 days (except if toxicity is 
transaminitis, which may last >7 days if total bilirubin is normal or grade-1, or 
flu-like symptoms that respond to standard treatments); and 4) grade-3 haemato- 
logic toxicity persisting for >7 days (except isolated lymphopenia). The MTD was 
defined as the dose immediately preceding that for which two or more dose-limiting 
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toxicities were recorded. Ifno MTD was defined, the highest dose was defined as the 
MED. 

Starting with a subset of patients in cohort 4, patients were pre-medicated with 
acetaminophen (and also treated every 6h as needed). Patients were advised to 
hydrate orally for 24 h before treatment (for example, =1 litre of solute-containing 
fluids), and they received hydration through 24h after treatment (for example, 
1-2 litres of fluids, orally or intravenously). Patients were monitored after treat- 
ment for 24h in the hospital and for at least 29 days as outpatients. A physical 
exam and an interval medical history were performed on each weekly study visit. 
Safety monitoring included adverse-event monitoring (National Cancer Institute 
common toxicity criteria, version 3.0) and standard laboratory toxicity grading for 
haematology, liver and renal function, coagulation studies, serum chemistry and 
urinalysis. 

Pharmacokinetic measurements in blood. JX-594 pharmacokinetics as well as 
gene expression and replication in solid tumours were assessed. JX-594 genome 
concentration in blood was measured by qPCR as previously described'*”’. The 
pharmacokinetics of JX-594 during and immediately after administration were 
determined by qPCR. Whole EDTA-blood samples were taken before the start of 
infusion; 15, 30 and 60 min after the start of infusion; and 30, 60, 120 and 240 min 
after the end of infusion. Area under the curve (AUC), maximum concentration 
(Cmax) and half-life (tj,2) were calculated using WinNonLin, version 5.2 
(WinNonLin copyright 2010, Pharsight Corporation). The AUC was calculated 
by the linear trapezoidal method and C max was determined directly by inspection. 
Antibody titres to the B-galactosidase marker transgene. Serum samples were 
obtained at baseline and on days 15 and 29. Human IgG antibodies to B-gal were 
measured by ELISA. Briefly, plates (NUNC MaxiSorp, Thermo Fisher Scientific) 
with wells containing f-galactosidase (Sigma) or bicarbonate/carbonate buffer 
only were incubated overnight at 4 °C and washed with PBS-Tween before incuba- 
tion with blocking buffer (PBS with 1% bovine serum albumin (BSA), ELISA grade 
(Sigma)). Diluted serum (1:50, 1:100 or 1:200 in PBS with 0.05% Tween and 1% 
BSA) was added to f-galactosidase-coated and control wells in duplicate and 
incubated at 23°C. Plates were washed and incubated with alkaline-phospha- 
tase-labelled goat anti-human IgG (Abcam) diluted 1:2,000. After washing, the 
colorimetric substrate alkaline phosphatase yellow (pNPP, Sigma) was added, and 
NaOH was added to stop colour development after 10 min. Absorbance was read 
at 405 nm, and absorbance at 630nm was subtracted. Control-well values were 
subtracted to account for non-specific binding, and titres values were calculated by 
comparison to a standard curve of positive sera, arbitrarily assigned a titre of 8,000. 
Neutralizing antibody titres to vaccinia virus. This procedure is based on the 
ability of neutralizing antibodies in patient serum samples to reduce the cytopathic 
effect caused by live vaccinia virus. Serum samples obtained at baseline and on 
days 4, 8, 15, 22 and 29 were heat-inactivated, serially diluted in 96-well format 
(dilution factor 10-3,200,000) and incubated with vaccinia virus for 2h before 
transfer of the mixture onto monolayers of A2780 cells. Cell viability was measured 
3 days after inoculation by a colorimetric assay based on live-cell-mediated reduc- 
tion of tetrazolium salt to formazan (Cell Counting Kit-8, Donjindo Laboratories). 
The neutralizing antibody titre was defined as the reciprocal of the highest dilution 
of serum that results in =50% cell viability. 

White blood cell induction. White blood cell count was performed by routine 
laboratory testing and was included in the safety assessment as defined in the 
protocol. White blood cell counts were assessed at baseline and on days 4, 8, 15, 
22 and 29. 

Cytokine measurements. The endogenous cytokines IL-1B, IL-4, IL-6, IL-10, 
TNF-a and IFN-y were measured in multiplex in plasma samples obtained at 
baseline, 3h and 8h after dosing, and on days 4 and 8 using a Milliplex kit as 
directed by the manufacturer (Millipore). GM-CSF concentrations in plasma were 
determined at baseline, 3h and 8h after dosing and on days 4, 8, 15, 22 and 29 
using the Quantikine hGM-CSF sandwich ELISA kit as directed by the manufac- 
turer (R&D Systems). 

Tumour biopsy analysis. Biopsies (excisional, core-needle, or fine-needle aspir- 
ate) were obtained from all subjects 8-10 days after treatment, formalin-fixed and 
paraffin-embedded. Sections were subjected to haemotoxylin and eosin staining, 
immunohistochemical staining for JX-594 proteins, and PCR for JX-594 genomes. 
Immunohistochemistry used anti-vaccinia polyclonal antibody (Quartett) and 
secondary antibody kit (Vectastain, Vector Laboratories). For detection of 
B-galactosidase by immunohistochemistry, an anti-f-galactosidase polyclonal 
antibody (Abcam) was used. Negative controls were run without primary antibody 
and tumours from mice treated with JX-594 were included as positive controls. For 
PCR, DNA was extracted from 5 X 10-41m sections using FormaPure kit 
(Agencourt) and amplified using primers corresponding to the vaccinia E3L gene, 
TCCGTCGATGTCTACACAGG and ATGTATCCCGCGAAAAATCA, designed 
using Primer3 software’*, using QuantiTect SYBR Green PCR kit (Qiagen). 
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Three-dimensional reconstruction. An excisional biopsy was obtained from 
patient 20 on day10 and processed for immunohistochemical detection of 
vaccinia virus. Serial sections (126) were cut and every other section was stained 
for virus. Two-dimensional pictures of the sections were then converted into a 
three-dimensional volume using HTK histology toolkit software (Robarts Imaging 
Institute). Volume reconstruction was completed using alignment and segmenta- 
tion-contouring algorithms that oriented each tissue section on top of one another. 
Each tissue-section image, once oriented, was then converted from two- 
dimensional pixels into three-dimensional voxels. These three-dimensional stacks 
were then rendered to generate the reconstructed tumour. Regions of infection 
were highlighted in green to aid visualization, and in another representation in 
which the model can be viewed in orthogonal planes, image contrast was adjusted 
to 50 (Adobe Photoshop CS2, Adobe). Scaling in the three-dimensional model is 
derived from scaling from two-dimensional images. 

Tumour response analysis. Tumour response was assessed by contrast-enhanced 
computed tomography (CT) imaging on day 29 in all patients, and on week 10 in 
patients who remained in the study. Maximum tumour diameters and Hounsfield 
units (density estimate) were obtained at all time points. Patients in the expansion 
cohort had PET-CT scans done at the same time points; standard uptake values 
were determined from PET scans. RECIST** and modified Choi criteria'*” for 
response were employed in image evaluations. Patient 22 (metastatic mesothelioma) 
was evaluated by modified RECIST for mesothelioma’’. The PET response was also 
determined for PET-evaluable patients. 

Equipment and settings. GFP expression from JX-594-GEP*/B-gal_ in human 
tissue explants was visualized using the Leica M205FA microscope and Leica 
microsystem LAS AF6000 acquisition software (Leica Microsystem) at 2-4 
magnification. The same exposure time was used across all samples. Images of 
tumour and normal-tissue samples for each patient were captured on the same 
day. Images were stored using Adobe Photoshop version 7.0 (Fig. 1). 


Immunohistochemistry sections were digitized using the Aperio Scanscope 

(Aperio) and analysed using ImageScope v10.2.2.2319 software (Aperio). No 
image adjustments were applied to Fig. 3a-f and i. Adobe Photoshop CS software 
(Adobe) was used to apply linear adjustments to brightness and contrast across all 
compared stains where indicated (Fig. 3j-I). 
Statistical analysis. The study sample size was set to assess safety issues. The 
primary objectives were to study the safety and to determine the MTD/MFD of 
JX-594 after intravenous infusion. Secondary objectives included pharmacokinetics 
and pharmacodynamics, immune responses (neutralizing antibodies, anti-B- 
galactosidase antibodies and cytokines) and delivery of JX-594 to solid tumours 
after intravenous infusion. The likelihood of dose escalation, given variation in true 
dose-limiting toxicities in the treated population, was calculated as per routine in 
Phase 1 dose-escalation trials. The expected sample size was 18-24 patients. 

The Spearman’s correlation coefficient between ranks was used to calculate 
statistical dependence between antibody induction to B-galactosidase and dose 
cohort (percentage of patients with antibody induction in cohorts 1-5), as well as 
appearance of new tumours and dose cohort (percentage of patients with new 
tumours in cohorts 1-5)*°. 
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Modulation of Rab GTPase function by a protein 
phosphocholine transferase 


Shaeri Mukherjee'*, Xiaoyun Liu'*, Kohei Arasaki', Justin McDonough’, Jorge E. Galan! & Craig R. Roy! 


The intracellular pathogen Legionella pneumophila modulates the 
activity of host GTPases to direct the transport and assembly of the 
membrane-bound compartment in which it resides’ °. In vitro studies 
have indicated that the Legionella protein DrrA post-translationally 
modifies the GTPase Rab1 by a process called AMPylation’. Here we 
used mass spectrometry to investigate post-translational modifica- 
tions to Rab1 that occur during infection of host cells by Legionella. 
Consistent with in vitro studies, DrrA-mediated AMPylation of a 
conserved tyrosine residue in the switch II region of Rabl was 
detected during infection. In addition, a modification to an adjacent 
serine residue in Rab1 was discovered, which was independent of 
DrrA. The Legionella effector protein AnkX was required for this 
modification. Biochemical studies determined that AnkX directly 
mediates the covalent attachment of a phosphocholine moiety to 
Rabl. This phosphocholine transferase activity used CDP-choline 
as a substrate and required a conserved histidine residue located in 
the FIC domain of the AnkX protein. During infection, AnkX 
modified both Rab1 and Rab35, which explains how this protein 
modulates membrane transport through both the endocytic and 
exocytic pathways of the host cell. Thus, phosphocholination of 
Rab GTPases represents a mechanism by which bacterial FIC- 
domain-containing proteins can alter host-cell functions. 

Legionella pneumophila is an intracellular pathogen that translo- 
cates proteins called effectors into the host-cell cytosol using a type 
IV secretion system called Dot/Icm®. The Legionella protein DrrA (also 
known as SidM) is an effector that targets the host GTPase Rab1 (refs 
1-3, 5). Initially identified as a Rabl-specific guanine nucleotide 
exchange factor (GEF), recent studies showed that the amino-terminal 
region of DrrA has structural similarity to glutamine synthetase ade- 
nylyl transferase (GS-ATase) and shares the catalytically important 
sequence motif G-X,,-D-X-D, which enables DrrA to AMPylate the 
Tyr 77 residue in the class II switch region of Rab1B’. To determine if 
the in vitro activity described for DrrA is biologically relevant we 
examined whether the endogenous DrrA protein mediates Rab1l 
AMPylation when delivered into host cells during Legionella infection. 

Cells were infected with a strain of Legionella that has a functional 
Dot/Icm system that delivers effectors into host cells (wild type) or an 
isogenic AdotA mutant that has a non-functional Dot/Icm system, and 
Rabl protein was analysed by liquid chromatography-tandem mass 
spectrometry (LC-MS/MS) (Fig. 1a). Two different modifications in 
the switch II region in Rab1 were detected after infection with wild- 
type Legionella. A fragment that corresponded to an AMPylated 
TITSSYYR peptide (mass to charge ratio, m/z = 660.5) was detected. 
Unexpectedly, a form of this peptide with an unknown moiety of 
183 Da (m/z = 578.5) was also detected (Fig. 1a). The AdotA mutant 
revealed that both modifications required the delivery of effector pro- 
teins into host cells during infection. Thus, Rabl is modified during 
Legionella infection by AMPylation and by a second unknown post- 
translational mechanism. 

Cells were infected with mutant strains of Legionella deficient in 
effectors that could be involved in AMPylation of Rab1. In addition to 


the AdrrA mutant, a AankX mutant of Legionella was examined. The 
AnkX protein contains a FIC domain, which for other bacterial effectors 
has been shown to have an enzymatic activity that promotes the 
AMPylation of small GTPases”. When microinjected into mammalian 
cells the AnkX protein disrupts membrane transport in the secretory 
pathway and interferes with the sorting of transferrin from early endo- 
somes, consistent with AnkX being an effector that disrupts the activities 
of host membrane transport proteins, potentially by Rab AMPylation™. 

Rab1 AMPylation was not detected in the samples isolated from cells 
infected with the AdrrA mutant, indicating that DrrA is the primary 
effector mediating Rabl1 AMPylation in vivo (Fig. 1a). The unknown 
modification (m/z = 578.5) was detected after infection with the AdrrA 
mutant, but was not detected after infection with the AankX mutant. 
Thus, the unknown modification to Rab1 that occurs during infection 
requires AnkX. Defects in Rabl modifications exhibited by these 
Legionella mutants were complemented upon the introduction of plas- 
mids that restored DrrA and AnkX production (Supplementary Fig. 1a). 
MS/MS analysis revealed that the unknown 183 Da moiety was attached 
to Ser 79 of Rabl1A, adjacent to the Tyr 80 residue AMPylated by DrrA 
(Fig. 1b). These residues correspond to Ser 76 and Tyr 77 in Rab1B. 

Purified DrrA radiolabelled GST-Rab1 in vitro when **P-a-labelled 
ATP was used as a substrate, but no labelling was detected using *’P-y- 
labelled ATP, validating that DrrA mediates the attachment of AMP to 
Rab] (Fig. 1c). The structurally distinct N-terminal region of DrrA was 
sufficient for AMPylation*'*'’, and no AMPylation activity was 
detected for DrrA(340-533) or the DrrA(D110A,D112A) variant hav- 
ing the G-X,,;-D-X-D adenylyl transferase domain inactivated. The 
effector AnkX was unable to efficiently AMPylate Rabl, the GTP- 
locked Rab1(Q70L) variant or the GDP-locked Rab1(S25N) variant, 
indicating that purified AnkX does not have robust Rabl AMPylation 
activity (Fig. 1c). 

The nature of the unknown modification to Rab1 requiring AnkX was 
investigated further. Cells were transfected with a plasmid encoding either 
AnkX or the variant AnkX(H229A), which has the essential histidine 
residue in the FIC domain changed to alanine. Roughly 70% of the 
Rab] isolated from cells producing AnkX had the 183 Da moiety attached, 
whereas Rab] isolated from cells producing the AnkX(H229A) protein 
was unmodified (Fig. 2a). Thus, AnkX is both necessary and sufficient 
to promote a novel post-translational modification to Rab1 by a pro- 
cess that requires a functional FIC domain. 

For molecules <200 Da, the elemental composition can often be 
determined from a highly accurate mass measurement'*. High- 
resolution MS measurements obtained for the modified Rab1 peptide 
isolated from cells producing AnkX revealed that the moiety attached 
to the Ser79 residue had an accurate mass of 183.0661 Da (Sup- 
plementary Fig. 1b). This moiety did not match any known post- 
translational modifications, but when a metabolite database (http:// 
metlin.scripps.edu) was searched a near perfect match was made to 
the molecule phosphocholine, which has an exact mass of 183.0660 Da. 
The protonated moiety attached to the Rab1 peptide was selected and 
further dissociated by multi-stage MS analysis (MS/MS/MS). The 
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Figure 1 | Legionella infection mediates two different post-translational 
modifications to Rabl. a, 3-Flag—Rab1 was isolated from HEK293 FcyRII 
cells after infection with the indicated Legionella strains. LC-MS/MS analysis 
produced the extracted ion chromatograms. The peak in each graph indicates 
the amount of Rab1 peptide TITSSYYR with no modifications (m/z = 496.0), 
peptide containing an AMP moiety (m/z = 660.5), and peptide with an 
unknown modification (m/z = 578.5). b, MS/MS spectra obtained for the 
AMPylated Rab1 peptide (TITSSY(amp) YR, top) and the Rab1 peptide with 
the unknown modification (TITSS*YYR, bottom) showing m/z values of their 


fragments generated were of the sizes predicted for phosphocholine 
(Fig. 2b) and matched the MS/MS spectrum obtained following dis- 
sociation of a phosphocholine standard (Supplementary Fig. 2), sug- 
gesting that Rab1 is phosphocholinated by AnkX. 

If AnkX were functioning directly as a phosphocholine transferase, 
the host molecule most likely to be used as a substrate in this reaction 


fragments upon collision-induced dissociation. Fragments of the AMPylated 
peptide (top spectrum) have a mass shift of 329 starting at y3 (y3-y7), 
indicating a modification site at the first Tyr from the N terminus of the peptide. 
Fragments of the peptide with the unknown modification have a mass increase 
of 165 starting at y4 (y4—y6), indicating the second Ser from the N terminus was 
modified. c, Autoradiographs reveal AMPylated Rab1 from in vitro reactions 
containing the recombinant proteins indicated on the top of each panel and the 
radiolabelled nucleotides indicated below. 


would be CDP-choline, which is an intermediate used to synthesize 
phosphatidylcholine’’. Indeed, phosphocholination of Ser 79 on Rab1 
was detected for in vitro reactions containing CDP-choline and Ankx, 
but not in reactions containing DrrA (Fig. 2c) or the AnkX(H229A) 
protein (Supplementary Fig. 3). Increasing the amount of AnkX in the 
in vitro reaction resulted in higher levels of phosphocholinated Rab1 
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Figure 2 | The Legionella effector AnkX functions as a Rab phosphocholine 
transferase. a, LC-MS/MS analysis of 3x-Flag-Rab1 isolated from HEK293 
cells that were either untransfected or transfected with a plasmid encoding 
GFP-tagged AnkX, AnkX(H229A) or DrrA. Extracted ion chromatograms 
indicate the amount of Rab1 peptide TITSSYYR with no modification 

(m/z = 496) and peptide with the unknown modification (m/z = 578.5). b, MS/ 
MS/MS analysis on the Rab1 peptide TITSSYYR with the unknown 
modification. The m/z = 184 peak corresponding to the protonated moiety 
attached to the Rab1 peptide was selected and subjected to further dissociation. 
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Indicated are the fragments identified in the MS/MS/MS spectrum that 
matched the fragments predicted upon dissociation of the protonated 
phosphocholine molecule. c, The peak in each graph indicates the amount of 
Rab1 peptide TITSSYYR with no modifications (m/z = 496) and 
phosphocholinated peptide (m/z = 578.5) after in vitro incubation of Rab1 with 
either DrrA or AnkX in the presence of CDP-choline. d, Immunoblots from in 
vitro reactions that contained Rab1 and the indicated amounts of AnkX. Blots 
were probed to detect phosphocholinated Rab1 (anti-PC) and total Rab] (anti- 
Rab1) in each reaction. 
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Figure 3 | AnkX and DrrA have overlapping but non-identical Rab 
specificities. a, Extracted ion chromatograms of 3 X -Flag—Rab6 isolated from 
HEK293 cells producing either DrrA or AnkX. Shown are graphs for the 
unmodified Rab6 peptide SLIPSYIR (m/z = 475) and the AMPylated 

(m/z = 639.5) and phosphocholinated (m/z = 557.5) forms of this peptide. 

b, Extracted ion chromatograms of 3 X -Flag—Rab35 isolated from HEK293 cells 
producing either DrrA or AnkX. Shown are graphs for the unmodified Rab35 


being detected by anti-phosphocholine immunoblot analysis (Fig. 2d), 
validating that Rab1 is phosphocholinated by AnkX. 

Phosphocholinated proteins in the size range of Rab GTPases were 
detected in lysates from cells producing AnkX and were not observed 
in lysates from cells producing AnkX(H229A) (Supplementary Fig. 4a). 
Phosphatidylinositol 4-phosphate and phosphatidic acid levels were 
not affected in cells producing AnkX, suggesting that there is no indirect 
effect on phospholipid metabolism (Supplementary Fig. 4b). The intra- 
cellular pathogen Coxiella burnetii translocates a FIC-domain effector 
called CBU_2078 into host cells*’. Although proteins reacting with the 
anti-phosphocholine antibody were found in the size range of small 
GTPases from cell lysates producing CBU_2078, there was no evidence 
of Golgi fragmentation or endosome enlargement in these cells 
(Supplementary Fig. 4a, c). Thus, defects in host membrane transport 
in AnkX-producing cells probably results from phosphocholination of 
a specific subset of Rab GTPases, although it cannot be excluded that 
CDP-choline consumption might augment these effects. 

The repertoire of Rab proteins that could be modified by Legionella 
effectors in vivo was investigated. Modifications to the Rab5 protein 
were not detected in cells producing either AnkX or DrrA (Sup- 
plementary Fig. 5). DrrA mediated the AMPylation of Rab6 on 
Tyr 82, but phosphocholination of Rab6 mediated by AnkX was not 
detected (Fig. 3a and Supplementary Fig. 5). Rab35 is a Rab1 family 
member that regulates the sorting of cargo from early endosomes, 
and interfering with Rab35 function results in enlarged early endo- 
somes”. Importantly, specific perturbations in Rab35 function result 
in a cellular phenotype that closely mirrors the defects in endosome 
morphology observed in cells microinjected with purified AnkX”™*. 
Phosphocholinated Rab35 was detected in samples isolated from cells 
producing AnkX, and AMPylated Rab35 was detected from cells pro- 
ducing DrrA (Fig. 3b). During infection, phosphocholination of Rab35 
required AnkX and AMPylation of Rab35 required DrrA (Fig. 3c). 
Thus, AnkX has specificity for Rab1 family members. 

To test whether previously described cellular disruptions mediated 
by AnkX required the FIC-domain-dependent phosphocholine trans- 
ferase activity (Fig. 4a), cellular phenotypes mediated by AnkX and the 
AnkX(H229A) mutant were compared. Disruption of the Golgi appar- 
atus and a block in secretion of host alkaline phosphatase into the 
culture supernatant were observed in cells producing AnkX but not 
in cells producing AnkX(H229A) (Fig. 4b and Supplementary Fig. 7). 
Importantly, when the Ser 79 residue in Rab1 was changed to alanine, 
the variant protein was no longer phosphocholinated by AnkX (Sup- 
plementary Fig. 8a); however, Rab1(S79A) interfered with secretion of 
alkaline phosphatase when produced in cells (Supplementary Fig. 8b). 
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peptide TITSTYYR (m/z = 503.0), the AMPylated peptide (m/z = 667.5) and 
the phosphocholinated peptide (m/z = 585.5). ¢, 3X-Flag-Rab35 was isolated 
from HEK293 FcyRII cells after infection with the indicated Legionella strains. 
The peak in each graph indicates the amount of Rab35 peptide TITSTYYR with 
no modifications (m/z = 503), peptide containing an AMP moiety 

(m/z = 667.5) and phosphocholinated peptide (m/z = 585.5). WT, wild type. 


Thus, AnkX is targeting a residue in Rab1 that is critical for function. 
There was a significant increase in the number of cells containing 
enlarged early endosomes in cells producing AnkX compared to cells 
producing AnkX(H229A) (Fig. 4c and Supplementary Fig. 7), consist- 
ent with the function of Rab35 being perturbed by phosphocholination. 
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Figure 4 | AnkX-mediated phosphocholination modulates the function of 
Rab1 and Rab35. a, Schematic representation of the AnkX protein showing 
the location of the FIC domain and the four predicted ankyrin repeat homology 
domains (A1l-A4). The amino acid sequence in a conserved region of the FIC 
domain containing the essential His 229 residue is shown. b, Secretion of 
alkaline phosphatase into the culture supernatant was measured for HEK293 
cells producing either GFP, GFP-AnkX or GFP-AnkX(H229A) as indicated on 
the x-axis. Data are the mean + standard deviation (s.d.) calculated from three 
independent sample wells. c, The disruption of early endosomes was assessed in 
COS7 cells producing either GFP-AnkX, GFP—AnkX(H229A) or GFP alone 
after staining for EEA1 (mean = s.d., n = 200, *P < 0.005 compared to the GFP 
alone control). d, Binding of recombinant connecdenn to 3X -Flag- 
Rab35(S22N) isolated from cells producing either GFP-CBU_2078 or GFP- 
AnkX or GFP-AnkX(H229A) was assessed by co-precipitation. The 
Coomassie-stained SDS-PAGE gel indicates the amount of connecdenn and 
3X -Flag-Rab35(S22N) in each precipitate. The anti-phosphocholine 
immunoblot indicates that Rab35(S22N) isolated from cells producing GFP- 
AnkX was phosphocholinated. e, Binding of purified His-tagged DrrA(340- 
533) to 3X-Flag-Rab1A isolated from cells producing either GFP-CBU_2078 
or GFP-AnkX or GFP-AnkX(H229A) was assessed by co-precipitation. The 
immunoblots indicate the amounts of DrrA(340-533) and the levels of 
phosphocholinated 3X -Flag-Rab1A present in each precipitate. 
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Because GEF proteins are essential for Rab activation, the effect of 
phosphocholination on the binding of Rab-specific GEFs was analysed. 
The eukaryotic connecdenn proteins are the only known GEFs for 
Rab35 and are required for Rab35 function in vivo’. A pronounced 
defect in binding of connecdenn was observed for phosphocholinated 
Rab35 isolated from cells producing AnkX (Fig. 4d), which would 
explain why AnkX overproduction mimics the cellular phenotype 
observed when connecdenn has been silenced in mammalian cells”. 
By contrast, there was no defect in the binding of phosphocholinated 
Rab1 with the GEF domain of DrrA (Fig. 4e), similar to what has been 
observed for DrrA interactions with AMPylated Rab1 (ref. 7). Thus, 
post-translational modifications mediated by the effectors AnkX and 
DrrA modulate the function of Rab GTPases during infection by tailor- 
ing the repertoire of proteins that interact with the modified GTPase. 

The characterization of DrrA and AnkX provides an example of 
Legionella having structurally distinct proteins with different bio- 
chemical activities that modulate the function of host vesicle transport 
proteins similarly. This concept of functional redundancy has been 
postulated but not shown clearly. The differences observed in the in 
vivo specificities shown by these two effectors, however, demonstrate 
that they are also likely to have roles in modulating Rab protein func- 
tions that do not overlap, which could explain why positive selection 
has led to the emergence of two different pathways to modify Rab 
protein function through post-translational modification. 

The reaction mediated by AnkX has similarities to the AMPylation 
reaction demonstrated for other FIC domain proteins. Both reactions 
use a nucleotide-based substrate as the donor molecule that mediates 
the post-translational modification process (Supplementary Fig. 9). 
Interestingly, in the AMPylation reaction, hydrolysis of the phosphoan- 
hydrous bond results in protein modification by the 5'-ribonucleotide 
of the donor substrate, whereas in the phosphocholination reaction the 
5'-ribonucleotide is presumably released and the phosphocholine 
group is transferred to the polypeptide chain. 

There are several examples of occasions where post-translational 
modifications introduced by bacterial toxins or effectors—which were 
thought to be the exclusive domain of pathogens—were discovered to 
represent mechanisms used to regulate eukaryotic cell functions. 
Curiously, the inclusion of a phosphocholine moiety in a protein 
structure has been indicated previously by studies examining peptides 
secreted by nematodes and from mammalian cells residing in the 
placenta’. Thus, protein phosphocholination may also be used by 
eukaryotic organisms to modulate cellular functions. 


METHODS SUMMARY 

MS/MS analysis of Rab GTPases was conducted on immunoprecipitated proteins 
that were fractionated by SDS-PAGE, digested with trypsin in the gel, and 
extracted peptides were separated using nano-LC and electrosprayed directly onto 
a linear ion trap mass spectrometer (LTQ Velos, ThermoElectron) for MS and MS/ 
MS analysis. All biochemical assays were conducted using purified proteins as 
described in Methods. The antibody TEPC-15 (Sigma) was used to detect phos- 
phocholinated proteins by immunonoblot analysis. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell culture and transfection. COS7 and HEK293 cells were grown in Dulbecco’s 
modified Eagle medium (DMEM) from Gibco (Carlsbad) containing 10% heat- 
inactivated fetal bovine serum (FBS; Gibco). Cell lines were cultured at 37 °C in 5% 
CO3. For transfection, COS7 or HEK293 cells were added to 12-mm coverslips in 
24-well plates at a density of 2.5 < 10* cells per well. Cells were transfected with 
0.5 pg of each plasmid. Cells were fixed 18h after transfection in 4% PFA, per- 
meabilized with 0.05% saponin and processed for immunofluorescence micro- 
scopy as described previously". 

Fluorescence microscopy. Digital images were acquired with a Nikon TE300 
microscope using a X100 1.4 N.A objective lens and a Hamamatsu ORCA-ER 
camera controlled by IP Lab software. 

Alkaline phosphatase secretion assay. Secretion assays were performed as 
described”. Briefly, HEK293 cells were co-transfected with a plasmid encoding 
secreted alkaline phosphatase and a plasmid encoding either GFP, GFP-AnkX or 
GFP-AnkX(H229A). After an 18 h incubation period, cells were washed with PBS 
and fresh medium was added. Cells were incubated for 10h and then the alkaline 
phosphatase secretion index was determined by measuring the ratio of alkaline 
phosphatase protein secreted into the culture medium to the total amount of 
alkaline phosphatase protein in the assay well. The Tropix PhosphaLight System 
Kit (Applied BioSystems) was used to measure alkaline phosphatase activity and the 
Tecan Infinite M1000 plate reader with iControl Software was used to detect 
chemiluminescence. Data shown are the mean + s.d. from three independent 
samples for each condition. Results were validated in two independent assays. 
Protein purification and in vitro AMPylation. His-tagged and GST-tagged 
proteins were purified as described previously’. Purified GST-Rab1A (2 1g) was 
incubated with 0.4 1g of purified effector protein in buffer (20 mM HEPES pH 7.4, 
100 mM NaCl, 1 mM MgCl, and 0.1 mM GTPYS) and incubated for 1 h at 30 °C in 
the presence of 2 Ci of **P-c-labelled ATP or **P-y-labelled ATP (Perkin Elmer). 
Labelled proteins were identified by autoradiography following SDS-PAGE. In 
vitro AMPylation results shown are representative of three independent assays. 
Bacterial strains and plasmids. The Legionella strains were grown on charcoal 
yeast extract plates as described previously'*. The parental strain (wild type) was 
L. pneumophila serogroup 1 strain Lp01, and the variant strains were all isogenic 
mutants described previously''*”°”°, with the exception of the AankX AdrrA 
double mutant, which was generated for this study using allelic exchange to 
introduce the AankX mutation into the AdrrA strain as described". For all experi- 
ments, Legionella were isolated from charcoal yeast extract plates after growth for 2 
days at 37°C. The plasmid pEFGPC2 (Clontech) was used for all GFP fusion 
constructs, the plasmid pJB1806 was used to produce DrrA and AnkX in 
Legionella, pQE30 (Qiagen) was used for all His-tagged constructs, and 
pGEX2TK (GE LifeSciences) was used for all GST-tagged constructs. The Rab 
plasmids were constructed using cDNA encoding human RAB1A, RABSA, 
RAB6A, RAB35 and canine RABIA. 

Cell lines and Legionella infection. HEK293 FCyRII cells” were used to create the 
HEK293 FCyRII 3X-Flag-Rab1A stable cell line. This cell line was used for assays 
examining Rab1 modifications during infection of host cells by Legionella. For each 
assay, cells grown to near confluency in two 10-cm dishes were infected with 
opsonized Legionella pneumophila strain LPO1 (wild type) or the isogenic mutants 
at an estimated multiplicity of 100 bacteria to 1 host cell. After incubation for 0.5 h at 
37 °C the cells were lysed in buffer containing 20 mM HEPES pH 7.4, 100 mM NaCl, 
1mM MgCh, 1% Triton X-100, 1mM PMSF and protease inhibitor cocktail 
(Roche). Lysates were centrifuged at 17,900g and the post-nuclear supernatant 
was then incubated with Flag-antibody-coated beads (Sigma) at 4 °C for 1 h, washed 
and the Rab1 protein was eluted using Flag peptide. Samples were then processed for 
MS/MS analysis. This same approach was used to assay modifications to Rab35 after 
infection, except that the HEK293 FcyRII cell line was transfected with an expression 
plasmid producing 3X -Flag-Rab35 and then infected 18 h after transfection. 
Golgi and early endosome disruption assay. The disruption of Golgi and early 
endosomes was assessed in COS7 cells producing GFP—AnkX, GFP-AnkX(H229A), 
GFP-CBU_2078 or GFP alone. Cells were fixed with 4% PFA 18 h after transfection 
and stained with mouse monoclonal antibodies specific for either GM130 or EEA1 
(BD Transduction Laboratories) at a dilution of 1:200. Golgi and endosome mor- 
phology in GFP-positive cells was assessed by fluorescence microscopy. The Golgi 
was completely fragmented in nearly all the cells producing AnkX, whereas no 
significant fragmentation was observed compared to the background of untrans- 
fected cells in cells producing GFP-~AnkX(H229A), GFP—CBU_2078 or GFP alone. 
The early endosome disruption index represents the percentage of cells producing 
the indicated GFP protein that showed an enlarged endosome phenotype as deter- 
mined by EEA1 staining. The data represent the mean + s.d., from three independ- 
ent replicates in which 200 cells were counted for each protein. P values were 
computed using Student’s unpaired t-test. Data shown were validated in two inde- 
pendent experiments. 
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In vitro phosphocholination assay. GST-tagged Rab1A (51g) attached to glu- 
tathione agarose was incubated with thrombin to remove the tag. RablA was 
incubated with 1 j1g of purified effector protein in buffer (20 mM HEPES pH 7.4, 
100 mM NaCl, 1mM MgCh, 1mM ATP) and incubated for 1h at 30°C in the 
presence of 1 mM CDP-choline. Samples were boiled in SDS-loading buffer and the 
Rab1 protein was excised after SDS-PAGE and analysed by LC-MS/MS analysis. 
For the immunoblot analysis, the amounts of AnkX in each reaction varied from 
0 pg to 1.0 ig, and the amount of modified Rab] in each reaction was compared by 
immunoblot analysis using an antibody specific for phosphocholine (TEPC-15; 
Sigma) to detect modified Rab1 and an antibody specific for Rab1A (Santa Cruz 
Biotechnologies) to detect total Rab1. Data shown were validated in three inde- 
pendent experiments. 

Detection of host proteins modified by AnkX and CBU_2048 in vivo. HEK293 
cells in 24-well dishes were transfected with plasmids encoding GFP-CBU_2078, 
GFP-AnkxX or GFP-AnkX(H229A) and cultured for 18 h. Cells were lysed in buffer 
containing 20 mM HEPES pH 7.4, 100 mM NaCl, 1 mM MgCh, 1% Triton X-100, 
1mM PMSF and protease inhibitor cocktail. The lysates were centrifuged at 17,900g 
and 50 pg of the supernatant was separated by SDS-PAGE for immunoblot analysis 
using the anti-phosphocholine-specific antibody TEPC-15 (Sigma). 

GEF binding assays. To measure connecdenn binding to modified Rab35, a plas- 
mid encoding 3X -Flag-Rab35(S22N) was co-transfected into cells together with a 
plasmid encoding GFP-CBU_2078, GFP—AnkX or GFP-AnkX(H229A). The GDP- 
locked allele of Rab35 was used because it demonstrates enhanced connecdenn 
binding. Cellular lysates were prepared as described earlier, and post-nuclear super- 
natant was incubated with 4 1g of purified connecdenn (GenBank accession number 
NP_659414). 3X-Flag-Rab35(S22N) was precipitated using anti-Flag agarose 
(Sigma) and eluted from the beads using the 3X -Flag peptide (Sigma). The eluted 
proteins were resolved by SDS-PAGE gel and proteins were identified after the gel 
was stained with Coomassie brilliant blue dye. The locations of 3X-Flag- 
Rab35(S22N) and connecdenn were determined by running purified connecdenn 
and 3X -Flag-Rab35(S22N) in adjacent wells. To measure DrrA binding to modified 
Rab1, HEK293 cells that stably produce 3X-Flag—-Rab1 were transfected with GFP- 
CBU_2078, GFP-AnkX or GFP-AnkX(H2294A). 500 jig of post-nuclear supernatant 
was incubated with 0.5 tg of purified His-tagged DrrA (340-533). 3X -Flag—Rab1 was 
precipitated using anti-Flag agarose beads (Sigma). Immunoblot analysis was used to 
compare the amount of DrrA (anti-His), the amount of total 3X-Flag-Rab1 (anti- 
Flag, Sigma) and the amount of phosphocholinated Rab1 (TEPC-15; Sigma) after 2% 
of the total amount of 3X-Flag-Rab1 precipitated from each reaction was fractio- 
nated by SDS-PAGE. Data shown were validated in three independent experiments. 
Mass spectrometry. 3X-Flag—Rab proteins were immunoprecipitated from cells 
using anti-Flag agarose beads, and separated by SDS-PAGE. The band correspond- 
ing to the 3X-Flag-Rab protein was excised from the gel and treated with DTT to 
reduce disulphide bonds and then alkylated with iodoacetamide (IAM). Trypsin 
digestion was allowed to occur overnight. Resulting peptides were extracted from gel 
matrix and then resuspended in aqueous buffer before final LC-MS/MS analysis. 
Nanoflow reverse-phase LC separation was carried out on a Proxeon EASY-nLC 
system (Thermo Scientific). The capillary column (75 tm X 150 mm, PICOFRIT, 
New Objective) was packed in-house. A methanol slurry containing 5 um, 100 A 
Magic C18AQ silica-based particles (Microm BioResources) was forced to run 
through an empty capillary (with a frit in the end) using a pressurized device. The 
LC mobile phase was comprised of solvent A (97% HO, 3% acetonitrile (ACN) and 
0.1% formic acid (FA)) and solvent B (100% ACN and 0.1% FA). The nano-LC 
separation was performed with the following gradient: B was increased from 7% to 
35% in 40 min and then raised to 90% in 3 min and kept there for 10 min before 
going back to 100% A for column equilibration. At the moment when peptides were 
eluted from the capillary column, they were electrosprayed directly onto a linear ion 
trap mass spectrometer (LTQ Velos, ThermoElectron) for MS and MS/MS analysis. 
A data-dependent mode was enabled for peptide fragmentation with one full MS 
scan followed by collision induced dissociation (CID) of the ten most intense peptide 
ions. Dynamic exclusion was enabled to preclude repeated analyses of the same 
precursor ion. MS/MS scans were processed and searched using MASCOT 
(Matrix Science). The resulting peptide and protein assignments were filtered to 
keep only those identifications with scores above extensive homology. High-resolu- 
tion MS/MS analysis was performed and the data was acquired by a high-resolution 
mass spectrometer (Orbitrap) by the Keck Proteomic Facility at Yale University. All 
LC-MS/MS data was validated by at least two independent experiments. 
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a-Synuclein occurs physiologically as a helically 
folded tetramer that resists aggregation 


Tim Bartels', Joanna G. Choi! & Dennis J. Selkoe! 


Parkinson’s disease is the second most common neurodegenerative 
disorder’. Growing evidence indicates a causative role of misfolded 
forms of the protein a-synuclein in the pathogenesis of Parkinson’s 
disease**. Intraneuronal aggregates of a-synuclein occur in Lewy 
bodies and Lewy neurites’, the cytopathological hallmarks of 
Parkinson’s disease and related disorders called synucleinopathies*. 
a-Synuclein has long been defined as a ‘natively unfolded’ monomer 
of about 14kDa (ref. 6) that is believed to acquire a-helical second- 
ary structure only upon binding to lipid vesicles’. This concept 
derives from the widespread use of recombinant bacterial expression 
protocols for in vitro studies, and of overexpression, sample heating 
and/or denaturing gels for cell culture and tissue studies. In contrast, 
we report that endogenous o-synuclein isolated and analysed under 
non-denaturing conditions from neuronal and non-neuronal cell 
lines, brain tissue and living human cells occurs in large part as a 
folded tetramer of about 58 kDa. Several methods, including ana- 
lytical ultracentrifugation, scanning transmission electron micro- 
scopy and in vitro cell crosslinking confirmed the occurrence of the 
tetramer. Native, cell-derived a-synuclein showed g-helical struc- 
ture without lipid addition and had much greater lipid-binding 
capacity than the recombinant a-synuclein studied heretofore. 
Whereas recombinantly expressed monomers readily aggregated 
into amyloid-like fibrils in vitro, native human tetramers underwent 
little or no amyloid-like aggregation. On the basis of these findings, 
we propose that destabilization of the helically folded tetramer 
precedes a-synuclein misfolding and aggregation in Parkinson’s 
disease and other human synucleinopathies, and that small 
molecules that stabilize the physiological tetramer could reduce 
a-synuclein pathogenicity. 

To identify the native state of «-synuclein in cells while avoiding the 
potential breakdown of physiological assemblies by detergents, we 
initially used native gel electrophoresis. «-Synuclein is expressed endo- 
genously in many cell types, so we chose to analyse the dopaminergic 
human neuroblastoma line, M17D (ref. 8) and the commonly used cell 
lines HEK293, HeLa and COS-7. Each of these cell lines predominantly 
contained a non-denatured «-synuclein-immunoreactive species 
migrating in blue native-polyacrylamide gel electrophoresis (BN- 
PAGE) at ~45-50 kDa (Fig. 1a, lanes 1-4). Because these initial results 
suggested an apparently stable oligomeric form under native condi- 
tions, we next probed the endogenous state of #-synuclein in normal 
brain. The frontal cortex of wild-type mice also revealed a ~45-50 kDa 
form of endogenous o-synuclein as the main species in the buffer- 
soluble fraction (Fig. la, lane 6). 

To assess the state of endogenous o-synuclein in living human cells, 
we examined freshly collected red blood cells (RBCs), which were 
recently found to have high o-synuclein expression’. Human RBCs 
contained a ~45-50 kDa a-synuclein immunoreactive band on BN- 
PAGE (Fig. la, lane 5). As a second non-denaturing gel system that 
precludes effects of the Coomassie dye used in BN-PAGE, we per- 
formed clear native-PAGE (CN-PAGE)”. The main o-synuclein 
species in all samples migrated at ~55-60 kDa, suggesting a tetramer 
(theoretical mass of monomer = 14,460 Da) (Fig. 1b, lanes 1-6). The 


better resolution of CN-PAGE without Coomassie dye also revealed 
small amounts of apparent monomers running below the 14kDa 
molecular weight marker (Fig. 1b, lanes 1-4, 6) and distinguished 
the small differences in amino acid length of the human and mouse 
a.-synuclein monomers and putative tetramers (Fig. 1b, lane 6). The 
endogenous ~55-60kDa species was detected by monoclonal «- 
synuclein antibodies Syn1, Syn211 and LB509 and polyclonal antibody 
C20 in both native gel systems. 

Because the migration of a protein on BN- or CN-PAGE does not 
depend solely on its mass but also on its conformation and charge, we 
used in vitro crosslinking to preserve the assembled state of the putat- 
ive o-synuclein oligomer, followed by denaturing SDS-PAGE. We 
observed SDS-stable «%-synuclein bands migrating at the expected posi- 
tions of a tetramer (~55 kDa) and non-crosslinked monomer in all 
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Figure 1 | Western blot analysis of lysates of M17D, HeLa, HEK293 and 
COS-7 cells, mouse cortex and human RBCs probed for endogenous 
a-synuclein. a, BN-PAGE. b, CN-PAGE. The band just below the main ~55- 
60 kDa RBC species (lane 6) may represent an alternatively spliced form of 
a-synuclein. Arrow marks a possible dimeric species. ¢, Left, SDS-PAGE/ 
western blot (antibody C20) analysis of cell lysates without crosslinking. Right, 
proteins were crosslinked in intact living cells with membrane permeable 
disuccinimidyl suberate (DSS) (M17D, HeLa, HEK 293, COS-7) or in RBC 
lysate with water soluble bis(sulphosuccinimidyl) suberate (BS*) and then run 
on SDS-PAGE. 
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cells, plus some putative dimer in the HeLa, HEK and red blood cells 
(right panel of Fig. 1c). This in vivo crosslinking supports the existence 
of native tetramers in cells. We performed two-dimensional gel ana- 
lysis after the in vivo crosslinking; that is, isoelectric focusing (IEF) to 
separate proteins by charge in a pH gradient followed by denaturing 
SDS-PAGE. The higher-migrating «-synuclein species in the cross- 
linked RBC lysates had the same pK, as monomers, within the limits of 
IEF resolution (Supplementary Fig. 1), consistent with their being 
homo-oligomers. 

Next, we developed a non-denaturing method to purify native 
a-synuclein from soluble RBC lysates (see Methods and Nature 
Protocol Exchange (http://www.nature.com/protocolexchange/proto- 
cols/2136)). This allowed us to estimate the mass of native o-synuclein 
based on distinct measurement principles that are not affected by protein 
conformation, unlike gel electrophoresis. Scanning transmission elec- 
tron microscopy (STEM) is useful for measuring the masses of purified, 
non-covalently bonded complexes that may not resist ionization during 
mass spectrometry'*”*. STEM images of a-synuclein purified under 
non-denaturing conditions from human RBCs (Supplementary Fig. 2) 
yielded a homogenous distribution of roughly spherical particles mea- 
suring ~3.0-3.5 nm in diameter (Fig. 2a). Unbiased automatic sampling 
of 1,000 particles gave a size distribution pattern with a peak at ~55 kDa 
(Fig. 2b). Importantly, we next applied sedimentation equilibrium 
analytical ultracentrifugation (SE-AUC), a technique commonly used 
to establish the oligomeric state of native proteins independent of their 
conformation. SE-AUC analysis of purified, native RBC o-synuclein 
performed at three different concentrations and at different rotor speeds 
yielded an average molecular weight of 57.8kDa (4.78 Svedbergs), 
strongly supporting a tetrameric assembly state (Fig. 2c). 

Numerous studies have reported conformational changes in 
a.-synuclein, with a focus on the natively unfolded recombinant mono- 
mer undergoing a random coil to &-helix transition upon in vitro 
interaction with small lipid vesicles’. This change is believed to be 
relevant to the poorly defined physiological function of «-synuclein 
in cells and could potentially decrease the likelihood of its aggregation 
into B-sheet-rich neurotoxic assemblies’. Unexpectedly, we found 


that circular dichroism spectra of the human RBC tetramer purified 
under non-denaturing conditions showed two minima of mean 
residue ellipticity at 222 and 208 nm (Fig. 3a), characteristic of an 
a-helically folded protein’. This result is inconsistent with the 
common assumption that «-synuclein is natively unfolded. The addi- 
tion of negatively charged, small unilamellar lipid vesicles (SUVs) did 
not induce a significant conformational change in the native tetramer 
by circular dichroism (Fig. 3a), but a random coil to «-helical con- 
version did occur (as reported) with recombinant monomer that had 
been expressed in bacteria (Fig. 3b). Incubation of the purified RBC 
o.-synuclein tetramer with Lipidex 1000, a reagent used to strip proteins 
of bound lipids and fatty acids’, did not change the conformation of 
the o,-helical «-synuclein tetramer (Supplementary Fig. 3), suggesting 
that significant lipid association is not required to maintain the folded 
structure of cellular «-synuclein. To support this possibility, we con- 
ducted a quantitative elemental phosphate analysis’® on the purified 
native «-synuclein to search for phospholipid. We obtained an average 
value of 0.25 mol phosphate per mol «-synuclein, making a significant 
presence of phospholipids on the «-helical «-synuclein purified from 
normal cells unlikely. Because post-translational modifications also 
could have an impact on the conformational differences between the 
native human RBC tetramer and the bacterially expressed, recom- 
binant human monomer, we performed mass spectrometry. The 
recombinant protein showed a mass peak at 14,462 kDa, very close 
to the theoretical predicted mass of 14,460 kDa, whereas the purified 
erythrocyte «-synuclein showed a peak at 14,505 kDa, indicative of 
only an N-«-acetylation commonly present on human proteins (theor- 
etical predicted mass = 14,502 kDa) (Supplementary Fig. 4). 

To validate the above results obtained on RBC a-synuclein using a 
different human cell type and a different non-denaturing purification 
method, we isolated «%-synuclein from a M17D human neuroblastoma 
cell line stably overexpressing wild-type human «-synuclein (3D5 
cells'’). x-Synuclein from untransfected M17D cell lysates migrated 
above bacterially expressed o-synuclein of confirmed random coil 
structure on CN-PAGE (Supplementary Fig. 5A). This higher electro- 
phoretic migration was also true of native («-helical) but not 
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Figure 2 | Sizing analyses of a-synuclein from human RBCs. 

a, Representative large-angle dark-field STEM image of purified o-synuclein 
from human RBCs. A few representative particles are circled. As an internal size 
standard, tobacco mosaic virus (TMV) helical rod was included during electron 
microscopy specimen preparation. b, Mass histogram (bin size = 5 kDa) of 
1,000 automatically selected o-synuclein particles. c, SE-AUC of purified, 
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native RBC «-synuclein. Top panel shows the individual experimental analyses 
fitting an ideal single-species model to the equilibrium data obtained at 11,612, 
20,644 and 32,256g for 1.1 mg ml * a-synuclein solution. The fitting yielded a 
molecular weight of 57,753 Da (standard deviation (s.d.) + 655.199) with a root 
mean squared deviation of 0.004533. D, attenuance. Bottom panel shows an 

overlay of the residuals of data and theoretical fit for the three different speeds. 
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denatured (random coil) purified RBC o-synuclein (Supplementary 
Fig. 5B). After a-synuclein was purified from the stably transfected 
3D5 cell line or from RBCs, the two differentially purified and «- 
helically folded (by circular dichroism) cellular proteins co-migrated 
at ~55-60 kDa on CN-PAGE, as expected (Supplementary Fig. 6). 
Unbiased, automated STEM measurements of 3,000 particles revealed 
that the 3D5 neuroblastoma cells contained «-synuclein tetramers of 
closely similar estimated molecular weight (peak mass ~55 kDa) to 
those of the RBC a-synuclein (Supplementary Fig. 7; compare to 
Fig. 2b). Circular dichroism spectroscopy revealed the purified 3D5 
cell «-synuclein to have two minima of mean residue ellipticity at 222 
and 208nm (Supplementary Fig. 8). To further exclude artefacts 
arising during purification of cellular «-synuclein such as adventitious 
association of biomolecules (for example, cellular lipids not removed 
by Lipidex 1000) that artificially fold the protein, we repeated our 
experiments with the 3D5 parental line M17D, which has only low 
levels of endogenous «-synuclein. We added (‘spiked’) bacterially 
expressed recombinant human monomer onto the M17D cells before 
performing cell lysis and the full purification, and then assayed its 
structural properties. This exposure to cell lysates and the purification 
procedure led to no induction of helical folding in the recombinant 
human «-synuclein (Supplementary Fig. 9), whereas simultaneously 
purified 3D5 cell human «-synuclein did show this conformation, 
supporting our conclusion that o-helically folded «-synuclein does 
not arise owing to artificial manipulation of the protein. 

Membrane association has been viewed as a principal functional 
property of o-synuclein in vitro’ and in living cells'*. We searched for 
differential binding of recombinant monomeric human o-synuclein 
versus RBC tetrameric human o-synuclein to a lipid membrane using 
surface plasmon resonance (SPR). Because recombinant «-synuclein is 
reported to have preferential affinity for negatively charged lipids, 
especially phosphatidyl serine’, we chose a mixed phosphatidyl choline 
and phosphatidyl serine (PC/PS) membrane as a model membrane. 
Exposure of a PC/PS membrane to cell-derived, purified native 
a-synuclein in a Biacore instrument produced a markedly increased 
resonance angle shift compared to conventional recombinant monomers 
at identical concentrations in solution (Fig. 3c), indicating dramatically 
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Figure 3 | Comparative analyses of native (cell-derived) and bacterial 
a-synuclein. a, Circular dichroism spectra of native tetrameric «-synuclein 
(isolated under non-denaturing conditions from human RBCs) before versus 
after addition of POPC/POPS SUVs (PC/PS 4:1; protein:lipid 1:500). 

b, Circular dichroism spectra of recombinant o-synuclein monomer purified 
from E. coli, alone and with addition of PC/PS SUVs (protein/lipid 1:500). 
c, SPR sensorgram of equal protein concentrations of «-synuclein 
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increased lipid binding. Fitting a dilution series of «-synuclein tetramer 
injections to a two-state binding model (Supplementary Fig. 10) gave an 
apparent dissociation constant of K,,, = 56 + 61 nM, which is about two 
orders of magnitude lower than values obtained for recombinant 
monomer in analogous SPR studies’”. We next tested the amyloid 
aggregation propensity of the distinct species in a Thioflavin T 
(ThT) fluorescence assay. Monomeric and tetrameric o-synuclein 
showed very different characteristics, with samples of purified cellular 
a.-synuclein incubated under identical conditions showing no evidence 
of fibril formation in a time (10 days) more than sufficient to form 
mature, ThT-bound fibrils from equivalent amounts of unfolded 
recombinant o-synuclein (Fig. 3d). Analysis of protein concentration 
in the solution after the 10-day incubation showed that the RBC 
a.-synuclein was still present and soluble, making non-amyloid (that 
is, ThT-negative) aggregation of the tetramers unlikely. Interestingly, 
melting curves of purified tetrameric o%-synuclein showed that heat 
denaturation (at 95°C) seemed irreversible under our conditions 
(Supplementary Fig. 11). 

Our experiments provide several independent lines of evidence that 
endogenous cellular «-synuclein exists in large part as an o-helically 
folded, ~58 kDa tetramer under native conditions. This finding is in 
contrast to many biophysical and biochemical studies describing 
a-synuclein as a natively unfolded ~14kDa monomer. In an early 
study of bacterially expressed recombinant protein purified under 
non-denaturing conditions or with heat treatment, no conformational 
differences were observed, and it was concluded that «-synuclein is a 
natively unfolded monomer’. This suggests problems in generating 
properly folded protein in Escherichia coli, although a modified bac- 
terial expression protocol avoiding heating and denaturants has 
recently been found to yield a helical «%-synuclein tetramer closely 
resembling the species found by us in native human samples (W. 
Wang et al., personal communication). The reasons for the conforma- 
tional differences observed in these two bacterial studies are unknown. 
Using gel filtration on unfolded recombinant o-synuclein also showed 
an apparent molecular weight of ~60 kDa in some earlier studies; the 
data were interpreted as a decrease in mobility of the extended state of 
an unfolded protein in the tested matrices®. This suggests the possibility 
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recombinant monomer versus endogenous tetramer injected on a L1 chip 
covered with a PC/PS membrane. d, Amyloid-type aggregation kinetics of 
recombinant «-synuclein monomer versus native RBC tetramer monitored by 
ThT fluorescence; average values from 3 independent experiments (error 
bars show s.d.; some s.d. values for RBC-derived «-synuclein are smaller than 
the symbol size). 
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of a similar hydrodynamic radius for the unfolded monomer and the 
more compact, helically folded tetramer, making gel filtration an un- 
reliable indicator (and therefore not used here). Our evidence for a 
tetrameric molecular mass of endogenous o-synuclein was particularly 
supported by the analytical ultracentrifugation and the unbiased 
STEM analysis, both of which sizing methods are not based on con- 
formation. The STEM sizing was performed on intrinsic «-synuclein 
isolated from two cell types and using two distinct non-denaturing 
procedures. 

Our apparent disagreement with most published findings on the 
monomeric state of «-synuclein in cells and brain tissue, usually as 
judged by SDS-PAGE and western blotting, can be explained by the 
common use of denaturing detergents. Our tetramer aggregation data 
(Fig. 3d) are consistent with a recent report describing non-neurotoxic, 
aggregation-resistant o-synuclein oligomers in vivo”. Moreover, an 
oligomeric species of o-synuclein (size undefined) was observed by 
in vivo fluorescence lifetime imaging in an intact cell culture model”'. 
Given the close match between our observed molecular weights using 
SE-AUC (Fig. 2c) and STEM (Fig. 2b and Supplementary Fig. 7) and 
the theoretical weight of a tetramer, the detection of a tetrameric band 
on denaturing gels after in vivo crosslinking (Fig. 1c), and the IEF 
evidence post-crosslinking that the endogenous tetramer and dimer 
bands have pK, values similar to that of a monomer (Supplementary 
Fig. 1), we conclude that the predominant physiological species of 
a.-synuclein in cells and brain is a helically folded tetramer, although 
minor and variable amounts of monomers, dimers and trimers were 
detected in some cell types. The closely similar properties of o-synuclein 
observed until now in neural cells and fresh human RBCs recommends 
the latter as an abundant, available source for future studies of physio- 
logical o-synuclein. 

The higher lipid-binding capacity of native o-synuclein leads us to 
speculate that the monomer represents a not fully functional and less 
abundant species in normal cells. Given the much lower propensity of 
the native tetramer to aggregate into fibrils (Fig. 3d), it is likely that 
tetramers undergo destabilization before «-synuclein aggregates into 
abnormal oligomeric and fibrillar assemblies that can confer cytotoxicity 
in Parkinson’s disease and other «-synucleinopathies. Hypothetically, 
such a mechanism could be analogous in part to transthyretin amyloi- 
dosis, in which a native metastable tetramer circulates in human plasma 
but can become destabilized (for example, by pathogenic missense 
mutations) to allow monomers to aggregate aberrantly in tissue”. Our 
identification of helically folded o-synuclein tetramers encourages the 
design of compounds that, like those for transthyretin”’, could kineti- 
cally stabilize native tetramers and prevent pathogenic o-synuclein 
aggregation as a novel treatment approach for Parkinson’s disease, 
dementia with Lewy bodies and other synucleinopathies™. 


METHODS SUMMARY 


Native gel electrophoresis was conducted as described'®. For crosslinking, 1-5 mM 
disuccinimidyl suberate was added to living cells. RBC lysates were treated analo- 
gously but using 1 mM bis(sulphosuccinimidyl) suberate. To purify «-synuclein from 
fresh or packed frozen RBCs, an initial 25% (NH4)2SO,4 cut followed by a 50% 
(NH,4)2SOy precipitation substantially enriched «-synuclein. The resolubilized 50% 
pellet was injected onto a hydrophobic interaction column (HiTrap Phenyl HP; GE 
Healthcare) and eluted in 1 M to 0 M (NH4)2SOu, pH 7. Alternatively, o-synuclein- 
overexpressing 3D5 neuroblastoma cell lysate after (NH4)2SO, was injected onto a 
5-ml HiTrap Q HP column. A 25-500 mM NaCl (pH 8.0) gradient eluted o-synu- 
clein at ~300mM NaCl. «-Synuclein from both cell sources underwent a final 
purification step on a Superdex 75 SEC column. STEM analysis was conducted at 
the Brookhaven National Laboratory STEM user facility. Sedimentation equilibrium 
data were acquired on a Beckman XL-I analytical ultracentrifuge at speeds of 11,612, 
20,644 and 32,256g (AN-60 Ti rotor) and protein concentrations of 0.6, 1.1 and 
1.6 mg ml '. Circular dichroism spectroscopy for lipid-induced «-synuclein folding 
was conducted in the presence of 4mM PC/PS (4:1) SUVs. SPR spectroscopy was 
conducted as described'*. To quantify amyloid fibril growth, aliquots (10 pl) of 
purified «-synuclein were added to a 10 [1M ThT solution in 10 mM glycine buffer, 
pH 9. ThT fluorescence was measured by exciting at 444nm and scanning the 
emission wavelengths from 460-550 nm. 
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Full Methods and any associated references are available in the online version of 
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METHODS 


Materials. Recombinant human «-synuclein was bought from Anaspec. 
Recombinant human transthyretin was provided by I. Rappley and J. Kelly 
(Scripps Research Institute). HEK, COS-7 and HeLa cells were cultured in 
DMEM with 10% fetal bovine serum, penicillin (100 U ml '), streptomycin (100 pg 
ml!) and L-glutamine (2 mM). For M17D and 3D5 human neuroblastoma cells, 
standard DMEM was supplemented with 400 pg ml ' G418 and 1 pg ml’ puro- 
mycin. Frontal cortex was obtained from wild-type mice aged 4-9 months. 
a-Synuclein immunoblotting used antibodies C20 (1:1,000, Santa Cruz), LB509 
(1:400, Santa Cruz), Syn211 (1:200, Santa Cruz) and Syn1 (1:2,000, BD). 

Lipid preparation. PC/PS SUVs (30nm) of 80% 1-palmitoyl-2-oleoyl-sn- 
glycero-3-phosphocholine (POPC) and 20% 1-palmitoyl-2-oleoyl-sn-glycero-3- 
[phospho-L-serine] (POPS) (Avanti Polar Lipids) were prepared in 10mM 
sodium phosphate, pH 7.4, by sonication. 

Crosslinking. Cells were detached and incubated at room temperature (22 °C) for 
30 min in DSS crosslinker (1-5 mM), then quenched with 1 M Tris buffer, pH 7.4, 
for 15 min at room temperature. Human RBC lysates were treated analogously 
with 1 mM BS? (Pierce) to covalently crosslink lysine residues. 

BN-PAGE and CN-PAGE. For BN-PAGE, samples were run on 4-16% Bis-Tris 
BN-PAGE gels (Invitrogen) at 200 V and room temperature. The cathode buffer 
was 50 mM tricine, 15 mM Bis-Tris, 0.02% Brilliant Blue G (Serva), pH 7.0; the 
anode buffer was 50 mM Bis-Tris pH 7.0. CN-PAGE was conducted identically to 
BN-PAGE, but Coomassie blue was omitted from the sample and the cathode 
buffer. Electroblotting of protein on PVDF membranes (0.45 1m pore size) was 
conducted at 400mA for 2h. For molecular weight estimation three different 
molecular weight marker were loaded on each gel (Sigma Non-denaturing, 
108K6408, Invitrogen Native Mark, LC0725 GE Healthcare HMW Native 
Marker Kit, 17-0445-01). 

IEF two-dimensional PAGE. We used the IPGphor IEF system (GE Healthcare). 
Lysates were heated at 65 °C overnight and brought to 200 1l with sample rehy- 
dration buffer (7M urea, 2M thiourea, 2% Chapso, 0.5% IPG buffer (GE 
Healthcare, bromophenol blue) and applied on an 11cm 1D Ready-strip (Bio- 
Rad) with a pH gradient of 4-7. Sample was rehydrated for 16 h followed by IEF at 
500 V for 30 min, then 1,000 V for another 30 min, and then 8,000 V for 3.5 h. The 
1D strip was then applied to a precast NuPAGE ZOOM 4-12% Bis-Tris gel 
(Invitrogen) and run at 200 V. 

Purification of a-synuclein from human RBCs. Freshly collected and washed 
RBCs were resuspended in a threefold volume of ACK lysing buffer (Lonza). 
(NH,4)2SO, to a final concentration of 25% was added and incubated at 4°C for 
30 min. The lysate was centrifuged (20,000g, 20 min), and the supernatant brought 
up to 50% (NH4)2SO4. The pellet was washed several time in 55% (NH4)2SO4 to 
remove excess haemoglobin. The sample was centrifuged at 20,000g for 20 min 
and the pellet resolubilized in a 50-fold volume of 50 mM phosphate buffer, pH 
7.0, 1M (NH4)2SO4. Five millilitres of the resultant solution were injected onto a 
5-ml HiTrap phenyl hydrophobic interaction column (GE Healthcare) equili- 
brated with 50 mM phosphate buffer, pH 7.0, 1 M (NH4)2SO,. «-Synuclein was 
eluted with a 1 M to 0 M (NHy)2SO, gradient in 50 mM phosphate buffer, pH 7.0 
(a-synuclein eluted at ~0.75 M (NH4)2SO,). For anion exchange purification of 
RBC a-synuclein, we used the protocol used for neuroblastoma cells (see later), but 
the first run of RBC lysate sometimes showed low binding of «-synuclein and 
contamination by plasma transthyretin. In these cases we discarded the first eluate 
and used the flow-through for a second run, which showed significantly higher 
binding and subsequent purity. As a third alternative to HIC and AX, an XK 16/ 
100 column packed with activated thiolpropyl Sepharose 6B gel media was used 
(Supplementary Fig. 5B) (binding buffer, PBS; flow rate, 0.2 ml min '). In this 
case, the flow-trough contained o-synuclein and was processed further. The 
column was regenerated by eluting bound protein with 5 column volumes of 
binding buffer with 25 mM dithiothreitol, and reactivated with 1.5 mM dipyridyl 
sulphide in 50mM borate buffer pH 8.0. The final solution was concentrated 
(Amicon Ultra centrifugal filter units, MWCO 10,000, Millipore) and further 
purified via gel filtration. 

Purification of a-synuclein from human neuroblastoma cells. 3D5 cells 
(o-synuclein stables) and their parental M17D cells were scraped from the plates, 
washed in PBS and lysed by sonication. A (NH4)2SO, precipitation was conducted 
as described earlier, the 50% (NH4)2SO, pellet was taken up in 20 mM tris buffer, 
pH 8.0, 25 mM NaCl. The sample was injected onto a 5-ml HiTrap Q HP anion 
exchange column, equilibrated with 20mM Tris buffer, pH 8.0, 25mM NaCl. 
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a-Synuclein was eluted from the column with a 25-500mM NaCl gradient in 
20mM Tris buffer, pH 8.0. «-Synuclein was eluted at ~300mM NaCl. The 
column was regenerated with 1M NaCl in 20 mM tris buffer, pH 8.0. The final 
solution was concentrated (Amicon Ultra centrifugal filter units, MWCO 10,000, 
Millipore) and further purified via gel filtration. For the addition (‘spiking’) of 
exogenous recombinant o-synuclein monomers, bacterially expressed «-synuclein 
was added to the scraped M17D cell pellet and the purification scheme conducted 
as just described. 

Gel filtration. Aliquots (250 11) were injected onto either a Superdex 75 (10/300 
GL), Superdex 200 (10/300 GL) or a Superose 12 (10/300 GL) column (GE 
Healthcare) at 4°C and eluted with 50 mM ammonium acetate, pH 8.5. For size 
estimation, a gel filtration standard (Bio-Rad, cat. no. 151-1901) was run on each 
column, and the calibration curve was obtained by semi-logarithmic plotting of 
molecular weight versus elution volume divided by void volume. 

STEM. STEM was carried out at the Brookhaven National Laboratory STEM user 
facility with 100 ul of sample at a concentration of 300 1g ml’ in 50 mM ammo- 
nium acetate, pH 7.4, and diluted to find the appropriate concentration for a 
homogenous particle distribution. Tobacco mosaic virus (TMV) rods were 
included during specimen preparation as an internal sizing standard. 

Circular dichroism spectroscopy. Circular dichroism spectra were obtained 
using an Aviv Biomedical spectrometer (model 410) in the presence or absence 
of 4 mM PC/PS SUVs. The spectral contributions of buffer and SUVs were sub- 
tracted. Data are reported as mean residue ellipticities measured at 20 °C and a 
pathlength of 0.1 mm. 

Lipidex 1000 treatment. 10% (w/v) Lipidex 1000 (Perkin Elmer) was washed with 
50% methanol-ultra pure water and added to a 100 LM solution of purified 
a-synuclein from RBCs. The samples were stirred overnight at 37 °C, and o.-synuclein 
was purified from that mixture via size exclusion chromatography. 

SPR. All lipid binding experiments were performed at 20 °C on a BLIACORE 3000 
apparatus using the L1 sensor chip (Biacore AB). The running buffer was 10 mM 
sodium phosphate, pH 7.4. SUVs were applied to the sensor chip surface at a flow 
rate of 10 yl min‘ ' in the presence of 0.1 mM NaCl. Injections were done at a flow 
rate of 10 11 min”! with 50 pl sample volume. Apparent Ky values were calculated 
from equilibrium data of several dilution series, collected at 300-320 s. 

ThT binding. To detect amyloid fibril growth, a discontinuous assay was used. 
Aliquots (10 pl) were removed from each purified «-synuclein sample (lyophilized 
from 50 mM ammonium acetate, pH 7.4, and agitated at 37 °C at a concentration 
of 75 1M in 20 mM Bis-Tris propane, 100 mM LiCl, pH 7.4) and added to 2 ml ofa 
10M ThT solution in 10mM glycine buffer, pH 9. Fluorescence was directly 
quantified on a Varian Eclipse fluorescence spectrophotometer at 20 °C by excit- 
ing at 444 nm and scanning the emission wavelengths from 460-550 nm with slit 
widths set at 5nm (PMT at 750 V). 

Quantitative phosphate analysis. Samples (2 X 15 pland 2 X 30 pl of 1 mgml ! 
a.-synuclein in 50mM ammonium acetate, pH 7.4) were placed at the bottom of 
glass test tubes, 225 pl of 8.9 N H,SO, (in deionized water) was added, and the 
mixture was heated for 25 min at 200-215 °C. Next, 75 pl HO was added to all 
tubes at room temperature. After heating for 30 min at 200-215 °C, 1.95 ml deio- 
nized water and then 0.25 ml 2.5% ammonium molybdate(VI) tetrahydrate solu- 
tion (in deionized water) were added at room temperature. After addition of 
0.25 ml 10% ascorbic acid solution (in deionized water), the tubes were heated 
for 7min at 100°C, and samples were allowed to cool to room temperature. 
Absorbance at 820nm was measured, and phosphate concentration calculated 
using a calibration curve obtained from 7 phosphate standard solutions ranging 
from 0-50 nmol phosphate (Sigma-Aldrich). 

SE-AUC. AUC experiments were performed in a Beckman Optima XL-I analyt- 
ical ultracentrifuge. Sedimentation equilibrium experiments were carried out at 
purified «-synuclein protein concentrations of 1.6, 1.1 and 0.6 mg ml! in 50 mM 
ammonium acetate, pH 8.5. The experiments were performed at 20 °C at 11,612, 
20,644 and 32,256g (AN-60 Ti rotor), and data were collected at 278 nm. The 
software SEDPHAT (version 6.5) was used to calculate the M and s of the species 
present in equilibrium in the samples. For molecular weight analysis, we used the 
model ‘Species Analysis’ available in the SEDPHAT program with RI noise base- 
line correction. Analysis was performed for each protein concentration separately, 
and the molecular weight determined from the average obtained for the analyses of 
the 3 protein concentrations. The average errors and standard deviations were 
calculated using Monte-Carlo simulation, with 1,000 iterations and a confidence 
level of 0.68. 
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Solution structure of a minor and transiently formed 
state of a T4 lysozyme mutant 
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Proteins are inherently plastic molecules, whose function often 
critically depends on excursions between different molecular con- 
formations (conformers)'*. However, a rigorous understanding 
of the relation between a protein’s structure, dynamics and func- 
tion remains elusive. This is because many of the conformers on its 
energy landscape are only transiently formed and marginally 
populated (less than a few per cent of the total number of mol- 
ecules), so that they cannot be individually characterized by most 
biophysical tools. Here we study a lysozyme mutant from phage T4 
that binds hydrophobic molecules* and populates an excited state 
transiently (about 1 ms) to about 3% at 25 °C (ref. 5). We show that 
such binding occurs only via the ground state, and present the 
atomic-level model of the ‘invisible’, excited state obtained using 
a combined strategy of relaxation-dispersion NMR (ref. 6) and CS- 
Rosetta’ model building that rationalizes this observation. The 
model was tested using structure-based design calculations iden- 
tifying point mutants predicted to stabilize the excited state rela- 
tive to the ground state. In this way a pair of mutations were 
introduced, inverting the relative populations of the ground and 
excited states and altering function. Our results suggest a mech- 
anism for the evolution of a protein’s function by changing the 
delicate balance between the states on its energy landscape. More 
generally, they show that our approach can generate and validate 
models of excited protein states. 

A detailed characterization of the conformers along a protein’s 
energy landscape is important for understanding the structure- 
function relationship and also because such an analysis provides 
insight into fundamental aspects of protein structure and dynamics. 
In this vein, numerous detailed studies of mutant lysozymes and lyso- 
zyme complexes from phage T4 have greatly increased our under- 
standing of the inter-relation between structure, stability, folding 
and motion in proteins*. Among the approximately 700 mutant lyso- 
zymes and lysozyme complexes that have been characterized is a 
family where each member contains an engineered cavity in its hydro- 
phobic core, generated by replacing larger amino acids with alanine 
(ref. 9). The point mutant causing the most pronounced stability 
change involved the replacement of a leucine at position 99 (referred 
to in what follows as L99A T4L), creating a cavity of ~150 A° in the 
carboxy terminus of the enzyme that is able to bind hydrophobic 
ligands*. Interestingly, X-ray studies showed that the L99A mutant 
undergoes the least rearrangement at the site of mutation, with the 
structure essentially unchanged’. 

Despite the fact that the wild-type and L99A T4L structures are 
virtually identical in the crystalline state, solution NMR studies of 
the L99A mutant indicated that many of the peaks were significantly 
broadened relative to the corresponding resonances in data sets 
recorded of the wild-type protein’. Spectral broadening is indicative 
of dynamics on the microsecond-millisecond timescale®, and in this 


case provides a clear indication that cavity creation introduces one or 
more dynamic processes that are not observed in the wild-type enzyme. 
Such dynamics can be studied by NMR transverse spin-relaxation 
experiments, in which the relaxation rates of probe nuclei are measured 
as a function of the strength of applied radio-frequency fields*’°. These 
experiments provide a powerful approach to quantify structural transi- 
tions in proteins because they are sensitive to microsecond—millisecond 
exchange processes in which a highly populated ground state (G) inter- 
converts with conformers that can have much lower populations 
(>0.5%), referred to in what follows as excited states (E). 

Initial °N Carr—Purcell-Meiboom-Gill (CPMG) relaxation disper- 
sion NMR experiments indicated that L99A T4L undergoes a dynamic 
process involving residues that are proximal to the cavity. The relaxa- 
tion data were well fitted to a model in which a highly populated ground 
state (97%, 25 °C) interconverts with a second state that because of its 
low population (3%) and short lifetime (~1 ms) is ‘invisible’ in NMR 
spectra’. Using recently developed CPMG dispersion experiments", we 
have obtained nearly all of the backbone 1H, 1°N and °C chemical 
shifts—as well as side-chain methyl °C chemical shifts—of the invisible 
excited state with a high level of accuracy (Supplementary Fig. 1, 
Supplementary Tables 1-5). Such chemical shifts are powerful con- 
straints in structure calculations; when combined with computational 
protocols”’* they can be used to calculate accurate folds of small 
proteins, even in the absence of additional information, such as inter- 
nuclear distances'*"*. 

A comparison of the chemical shifts of the ground and excited states 
shows that conformational rearrangements occur in the vicinity of the 
cavity involving the C-terminal region of helix E and helices F, G, H and 
I (Fig. 1a, b). These regions do not become disordered in the excited 
state, as calculated squared order parameters reporting on the ampli- 
tudes of backbone motion from chemical shifts'®, S’acp change little 
between states (Fig. 1c). However, a decrease in helix propensity is 
noted for the C-terminal region of helix E, with a very significant 
concomitant increase in the helix content for the loop connecting 
helices F and G (Fig. 1d). 

The !°N, ‘HN, 'H% %C% and °C’ chemical shifts of the excited state 
were used to guide Rosetta ‘loop’ building and refinement'"* to generate 
structural models of the excited state (described in Supplementary 
Information, Supplementary Fig. 2). Only regions with significant 
chemical shift changes (residues 100-120, 132-146; Fig. la) were 
allowed to deviate from the X-ray structure of the L99A cavity mutant. 
The CS-Rosetta-based excited state conformers so produced are well 
converged, with pair-wise backbone root-mean-squared deviations 
(r.m.s.d.) for ten representative, low energy structures of 0.7 + 0.2 A 
over the region that was allowed to vary in the calculations (Sup- 
plementary Table 6). As a control, an identical protocol was used to 
generate the structure of the ground state in the vicinity of the cavity 
mutant, based on the same number of chemical shifts as for the excited 
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Figure 1 | L99A T4L exchanges between ground (visible) and excited 
(invisible) states, each with distinct conformations. a, Plot of 

2 
A@pus = oe ( a) as a function of residue, where AG; is the shift 
difference in p.p.m. between states, A@;srp is a nucleus specific value that 
corresponds to the range of shift values (1 s.d.) that are observed in a database of 
protein chemical shifts (http://www.bmrb.wisc.edu) for the nucleus in question 
CHIN, '°N, 3C%, 1H and '°C’) and N is the number of nuclei <5 that are 
included in the average. Significant A@gms differences are localized to a pair of 
regions (100-120, 132-146) that are highlighted in grey. The secondary 
structure of the ground state of L99A TAL is illustrated. b, Values of A@guis 
colour-coded onto the X-ray structure of L99A T4L (PDB: 3DMV”*), ranging 
from blue (A@gmg = 0) to red (A@gms > 0.7). The mesh surface indicates the 
position of the cavity formed by the Leu to Ala substitution at position 99. ¢, S’ 
values for the backbone amide groups in the ground (blue) and excited (red) states 
of L99A T4L as predicted by the RCI approach”’. d, Helix propensity values, 
predicted using TALOS+ (ref. 29), highlighting important changes in secondary 
structure between ground (blue) and excited (red) L99A T4L conformers. 


state conformer. The lowest energy structures so obtained are in excel- 
lent agreement with the L99A crystal structure (r.m.s.d. of 0.6 + 0.2 A, 
Supplementary Table 6). Figure 2 shows an overlay of ten low energy 
representative excited state structures (Fig. 2a), along with the X-ray 
structure of the ground state (Fig. 2b) for comparison. As predicted on 
the basis of the input chemical shift data (Fig. 1a), there are clear 
structural differences between ground and excited states. These occur 
in a region immediately surrounding the cavity, involving rearrange- 
ment of the pair of short helices F and G that are orthogonal in the 
wild-type structure and that form a single, continuous and nearly 
straight helix in the excited state. This conformational rearrangement 
also includes a significant change in the backbone dihedral angle (Y) 
of Phe 114 to a helical value in the excited state (+49° to —36°) anda 
reorientation of its side chain caused by a change in the torsion angle 
(%1) from a gauche— to a trans conformation (see below). The change 
in 7 projects the Phe 114 benzyl moiety into the cavity of L99A T4L, 
significantly decreasing its volume (Fig. 2c). 

To cross-validate the excited state structures, we used Rosetta struc- 
ture based design calculations to identify substitutions predicted to 
stabilize the excited state relative to the ground state (Supplementary 
Table 7). One such substitution is G113A, which replaces one of the 
most helix destabilizing residues (Gly) with the most favourable (Ala)'” 
in a region of the structure that is predicted to become more helical in 
the excited state. 'H-'°N and 'H-'°C spectra of L99A,G113A T4L 
(recorded at low temperature (1 °C) to slow down the exchange and 
hence improve spectral quality) show two sets of cross-peaks that can 
be connected by magnetization exchange'*"” (Fig. 3a, Supplementary 
Fig. 3). The first set corresponds to those observed for the L99A ground 
state, with a second set occurring at the positions predicted for the 
excited state on the basis of the chemical shifts obtained from CPMG 
relaxation dispersion experiments recorded on L99A T4L (Fig. 3b). 
Intensities of peaks from magnetization exchange experiments can be 
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Figure 2 | The structure of the invisible, excited state of L99A TAL. 

a, Superposition of the 10 lowest energy structures of the L99A T4L excited 
state. The locations of helices E, F and G are indicated, along with the side chain 
of Phe 114 (see c). Only residues 100-120 and 132-146 were allowed to deviate 
from the L99A T4L ground state X-ray structure in calculations of the excited 
conformer (Methods). b, Ground state structure of L99A T4L, showing helices 
E, F and G and the position of Phe 114 (PDB: 3DMV”) or of benzene (green; 
PDB: 3DMX”*) when it is bound inside the cavity (see d). c, d, Expanded regions 
of the excited state (c) and ground state (d) structures, focusing on the 
differences between helices F and G and the position of Phe 114. 


fitted to extract the population of the excited state, pg, and an exchange 
rate, kex = kez + keg (kj is the exchange rate from state i to j); values of 
Pr = 34+ 2% andk,,, = 48 + 1s |, at 1 °C, are obtained (Supplemen- 
tary Fig. 4, Supplementary Table 8). Thus, the G113A mutation shifts 
the G == E equilibrium, as expected from the excited state structure, 
from pp < 0.5% to 34% at 1 °C. 

In a previous set of studies based on analysis of '"N and methyl '°C 
CPMG relaxation dispersion profiles, we speculated that the excited 
state of L99A T4L is an open conformation where ligands can access the 
cavity’. The solution structure of the low populated L99A T4L con- 
former, however, predicts that hydrophobic ligands would not bind the 
excited state because the cavity is occupied by the side chain of Phe 114 
(Fig. 2c). As a second cross-validation of the structure, we measured the 
binding of benzene to the ground and excited conformers indepen- 
dently using a sample of L99A,G113A TAL, where separate peaks can be 
observed for each state (Fig. 3a). A previous study has established that 
benzene binds to L99A T4L witha millimolar Kp and a dissociation rate 
of close to 1,000s ' at 20°C (ref. 20). Lowering the temperature to 1 °C 
decreases both the rate of benzene binding and the rate of exchange 
between ground and excited states; these rates are reduced to the point 
where separate peaks are observed for the methyl group of Met 102 in 
*H-'°C HSQC spectra of the ground, excited and benzene-bound states 
of L99A, G113A T4L to which one molar equivalent of benzene was 
added (Fig. 3c). Rates of exchange between the three states can be 
quantified by analysis of magnetization exchange experiments'*!””!. 
From fits of the time-dependencies of the auto-peaks (labelled ‘GC’, 
‘EF, ‘B’ in Fig. 3c, diagonal panels of Fig. 3d) and cross-peaks (cross 
panels) to a model of three-site exchange (Fig. 3e and Supplementary 
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Figure 3 | Hydrophobic ligands do not bind the excited state of L99A TAL. 
a, Selected region of the ‘H-"°N correlation map from a magnetization 
exchange experiment recorded on L99A,G113A T4L at 1 °C, showing separate 
peaks for the ground (G) and excited (E) states. A pair of data sets are obtained, 
with the mixing time (Ty1x = 50 ms) recorded before or after the 1N chemical 
shift evolution period, and the data sets subtracted so that diagonal- (cross-) 
peaks are positive (negative)**. b, Correlation between A@y values measured 
directly from the spectrum in a (y axis) and corresponding values from CPMG 


Table 8), the six relevant rates, kip are extracted. Best fit values for kgp, 
kpp are <O.1s ! (Supplementary Fig. 5), and F-test analyses establish 
that there is no difference in the quality of the fits when these rates are 
set to zero, indicating that binding to the excited state does not occur. By 
contrast, kcp = 11.6+0.3s |andkgg=17.4+£0.4s |, so that ligand 
binding proceeds via the ground state. The mechanism by which this 
occurs is not at present known, but it must involve excursions of the 
ground state to additional conformations, presumably on a timescale 
faster than those that are accessible to the dispersion experiments 
described here. 

Whereas the G113A mutation shifts the fractional population of the 
excited state from approximately 3% to 34%, we were interested in 
perturbing the equilibrium still further. The R119P substitution is 
predicted by Rosetta to further favour the excited state because the 
X-ray structure of the L99A ground state’ is incompatible with a Pro at 
position 119 due to steric clashes involving an H® proton and the C’ of 
Thr 115. Consistent with this prediction, the 'H-'"N HSQC spectrum 
of G113A,R119P,L99A T4L contains a single set of peaks (Supplemen- 
tary Fig. 6) at resonance positions identical to those of the invisible, 
excited state of L99A that were determined by relaxation dispersion 
measurements (Supplementary Fig. 7). Further relaxation dispersion 
experiments recorded on the triple mutant established that the dom- 
inant conformer in solution, corresponding to the ligand inaccessible 
state (Supplementary Table 6), interconverts with a minor state con- 
former whose structure is that of the L99A ground state (Sup- 
plementary Fig. 8) (population of 3.8 + 0.1%, kx = 806+ 28s |, 
1°C). Thus, the pair of mutations G113A,R119P inverts the popula- 
tions of ground and excited states relative to L99A T4L. 

This population inversion, rendering the ligand-inaccessible state 
the major conformer, allows an additional test of the structure of the 
invisible L99A T4L excited state. Quantitative J-based scalar coupling 
experiments™ recorded on L99A,G113A,R119P T4L confirm that the 
% rotamer state for Phe 114 is trans, as observed in the CS-Rosetta 
based structure of the excited state (Supplementary Fig. 9, Sup- 
plementary Table 9, see above). By contrast, similar experiments 
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relaxation dispersion measurements of L99A TAL (25 °C; x axis). 
c, Magnetization exchange spectrum (Ty1x = 40 ms) recorded on an 
L99A,G113A T4L sample with a 1:1 molar equivalent of benzene, focusing on 
Met 102 that shows well resolved correlations from ground, excited and 
benzene-bound (B) states. d, Intensity of auto- and cross-peaks for residue 
Met 102 from magnetization exchange experiments recorded as a function of 
Turx (red circles), along with the best fit of the data (solid lines) to the exchange 
model of e. Values of rates of exchange, kip are indicated. 
recorded on the ground state of L99A T4L are consistent only with a 
gauche— conformation, as expected from the X-ray structure’. The 
trans x, rotameric state for Phe 114 is a novel feature of the L99A T4L 
excited state and L99A,G113A,R119P T4L. Ina G113A variant T4L, as 
examined here but where Leu is retained at position 99, a gauche— 
conformation is observed”? for Phe 114; a trans y, angle would lead to 
steric clashes with Leu 99 (Supplementary Fig. 10). 

The L99A mutation in T4L creates an energy landscape in which a 
low-lying excited state is transiently populated. We have shown that 
this invisible, excited state has different functional properties from the 
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Figure 4 | The delicate balance between states on the energy landscape can 
be readily manipulated through mutation, providing a path for protein 
evolvability. a-c, Selected regions from 'H-'*C HSQC spectra (recorded at 
1°C) of L99A TAL (a), L99A,G113A TAL (b) and L99A,G113A,R119P T4L 
(c), with the peaks from the ground and excited states coloured in blue and red, 
respectively. df, Corresponding energy landscapes, showing the structures of 
the ground and excited states and their fractional populations. 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 113 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


ground state—it does not bind hydrophobic ligands. This divergence 
in function can be controlled through mutation, with the G113A single 
mutation and the G113A,R119P double mutation changing the ratio of 
binding-competent to binding-incompetent states from 97% to 66% 
(G113A) and to less than 5% (G113A,R119P) (Fig. 4). The picture of 
evolving protein function suggested by our studies of L99A T4L is 
consistent with an emerging view of protein plasticity**, with each 
molecule sampling a range of structures. Each unique conformer can, 
in turn, potentially carry out a different function*”*’’. A small number 
of mutations can then shift the relative populations of the conformers, 
thereby changing the activity of the protein, as has been observed in 
directed evolution experiments involving the introduction of muta- 
tions into flexible loop regions of enzymes. Insight into the relation 
between protein dynamics, structure and evolvability is greatly facili- 
tated through the powerful combination of relaxation dispersion NUR 
and Rosetta enabling individual protein states that populate the energy 
landscape to be investigated, even in cases where these conformers are 
invisible to other biophysical techniques. As applications of this meth- 
odology continue to grow, so too will our understanding of how pro- 
tein dynamics control function, increasing the scope for the rational 
design of proteins with specific properties. 


METHODS SUMMARY 

Protein expression and purification. All NMR samples were prepared following 
previously published protocols, as described in detail in Methods. 

NMR experiments and structure calculations. NMR experiments were recorded 
and analysed as described in Methods. Structure calculations of the L99A T4L 
excited state were based on experimental backbone SN, '3C and 'H chemical shifts 
obtained from CPMG dispersion experiments. The structures of regions in the 
excited state with significant variations in chemical shift values relative to those in 
the ground state were computed using the loop modelling application of Rosetta’®. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Protein expression and purification. The gene for expression of L99A T4L was 
optimized for protein production (Genscript) and placed in a pET-29b plasmid. 
Additional plasmids for the expression of the mutants used in this study 
(L99A,G113A T4L; L99A,G113A,R119P T4L) were constructed from the L99A 
TAL plasmid. T4L proteins were expressed in Escherichia coli BL21(DE3) cells 
grown in M9 minimal media with glucose (3-4. g1~ ', typically) and ammonium 
chloride (NH4Cl, 1g1~’) as the sole carbon and nitrogen sources, respectively. 
Proteins with specific labelling patterns (see below) were obtained by expression in 
M9 media containing the appropriately labelled glucose, '"NH,Cl and solvent 
(H,0 or D,O)*". The labelling patterns (carbon source/solvent) used in this study 
are: uniform (U) °N (glucose/H30), uU-PC/PN ({'°Ce]-glucose/H,0), 
UHI C/PN ([1°Co,7Hz]-glucose/100% D0), U-7H/'°N ([?H;]-glucose/100% 
D,0), °C7/PN ([2-°C]-glucose/100% HO), U-'°C/U-'°N/50%7H ([1°Cg,"Hy]- 
glucose/50% DO), 13CH3-Met/'°N (glucose/H2O, supplemented with 100 mg] s 
of '°CH3-Met added to the media 30 min before induction of protein overexpres- 
sion). E. coli BL21(DE3) cells transformed with the appropriate plasmid were 
grown in one or two litres of media at 37 °C until an OD¢o9 of ~1. The temperature 
was then reduced to 16 °C and protein expression was induced with 1 mM IPTG 
for 14-18h. The cells were harvested by centrifugation and frozen. The protein 
was purified from the cells as described’. 

Samples. NMR samples (~1.5 mM in protein) were prepared in a buffer consist- 
ing of 50 mM sodium phosphate, 25 mM NaCl, 2mM EDTA, 2 mM NaN;, pH 5.5 
in either 10% or 100% D,O. The experiments were performed on Varian Inova 
spectrometers operating at frequencies ("H) of 500, 600 and 800 MHz, at a tem- 
perature of 25 °C, unless stated otherwise. 

Assignments. Complete assignments for L99A T4L have been reported previ- 
ously’. The major (ground) state peaks in the L99A,G113A T4L mutant were 
assigned by comparison with assigned spectra of L99A T4L. Minor (excited-) state 
"H-N resonance assignments were obtained using an DN magnetization 
exchange experiment” recorded at 800 MHz, 1 °C, with a mixing time (Tyrx) of 
50 ms. At 1°C the excited state is populated to ~34%, ke, ~ 50s |, so that very 
clear exchange peaks correlate ground and excited state resonances. Backbone 
tH/3c/N and 4c assignments for the L99A,G113A,R119P T4L mutant were 
obtained using standard triple resonance experiments” recorded at 34 °C either at 
500 or 800 MHz. Assignments were very close to complete. The elevated temper- 
ature (34 °C) and lower field (500 MHz) were used to minimize signal loss due to 
chemical exchange. 

CPMG experiments. 'H/'°N/'°C constant-time (CT) CPMG relaxation disper- 
sion experiments were (usually) performed at two static magnetic fields. Typically, 
dispersion curves were composed of a large number (~15) of vcpmg values with 
errors estimated based on two or three repeat values**. Here vcpyg = 1/(4tcpma); 
where 2tcpmc is the spacing between the refocusing 7 pulses applied during the 
CT delay of length Tpetax. Details of the experiments***”* used to characterize the 
excited state of L99A T4L are summarized in Supplementary Table 1. 

Sign experiments. Single-quantum CPMG relaxation dispersion experiments 
provide only the magnitude of the change in chemical shift |Awer| =|~p — wa 
between the two exchanging states. Signs were obtained by comparing the posi- 
tions of the ground state peaks in HSQC spectra recorded at different static mag- 
netic fields and/or between peak positions in HSQC/HMQC spectra recorded at 
the same field’””*. Once the signs of amide nitrogen A@ values were obtained, the 
corresponding signs of the amide protons were generated from zero-quantum 
(ZQ) and double-quantum (DQ) CPMG experiments”. Experiments used to 
obtain this information are listed in Supplementary Table 2. 

Quantitative J-modulated experiments. 7; angles of aromatic residues in both 
L99A T4L (ground state) and L99A,G113A,R119P T4L (ground state that is a 
mimic of the L99A T4L excited state structure, see Supplementary Fig. 7 and text) 
were determined by measurement of three-bond Joc, and Jycy scalar couplings 
using quantitative J-based experiments“° recorded with dephasing delays of 
100 ms Uercy) and 120 ms (Jncy), respectively. All experiments were obtained at 
35°C on a 600 MHz spectrometer, using uniformly '°N, °C enriched samples. 
Difference spectra for both experiments and both proteins are shown in 
Supplementary Fig. 9. Measured scalar coupling values are summarized in 
Supplementary Table 9 along with the 7, angles for the aromatic residues of the 
ground and excited state structures. 

Magnetization-exchange experiments. '°N magnetization exchange experi- 
ments, recorded at 1°C, 800 MHz, were used for assignment of excited state 
correlations of L99A,G113A TAL, as described above. A pair of experiments were 
recorded in which the exchange mixing period, Tyrx (50 ms), was placed (1) 
before and (2) after indirect detection of °N magnetization. Subtraction of the 
two data sets so obtained generates a two-dimensional spectrum where correla- 
tions from ground and excited states (positive) are connected by cross-peaks 
(negative), forming a ‘rectangular’ structure” (Fig. 3a). Quantitative methyl °C 
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keg 
magnetization exchange experiments to quantify exchange, G == E (see below 


kee 
or text), in L99A,G113A T4L were performed at 600 MHz, 1 °C, using a sample in 
which only Met-C* was '*C enriched. A second similarly labelled sample to which 
a small amount of benzene was added (approximately 1:1 molar equivalents of 
benzene and protein) was used to study exchange between the ground, excited and 
benzene-bound states of L99A,G113A T4L. Experiments were recorded with Ty1x 
values ranging between 0 and 85 ms (100 ms in the presence of benzene) with 
errors estimated based on repeat measurements. 
Data processing. The NMRpipe software package“! was used to processes all of the 
NMR data. Subsequent visualization and peak picking was achieved using the pro- 
gram Sparky”. The intensities of peaks (I) were obtained using the program FuDA 
(http://pound.med.utoronto.ca/software.html), while the CcpNmr set of programs 
was used to analyse some of the triple resonance assignment experiments. 
Analysis of CPMG data. Relaxation dispersion (RD) profiles, Ro e(Vcpmc), Were 
generated from peak intensities, I(vcpyc), measured in a series of 2D correlation 
maps recorded at various CPMG frequencies, vcpyg- The effective relaxation 
rates, Ro e¢(VcpmG)» Were computed via the relation: 


1 ie 
ik I(vcpmc) 
TRelax Ig 


where [, is the peak intensity extracted from a reference spectrum recorded with- 
out the CPMG block. RD profiles were analysed assuming a two-site exchanging 
system, G == E, where the major state, G, interconverts with the minor state, E, 
as described previously in the context of the L99A system***. The model para- 
meters defining the chemical exchange process, that is the exchange rate, kx, the 
population of the minor state, pg, and the absolute difference in chemical shifts 
between the two states, |A@gr|=|@p —@c|, were determined by minimizing the 
target function: 


Ro,et(Vcpmc) 
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where R, ?, and AR,"#, are the experimental effective transverse relaxation rates 


and their associated uncertainties, RS%¢(C) are back-calculated relaxation rates 
obtained by numerical integration of the Bloch-McConnell equations* using 
the program CATIA (http://pound.med.utoronto.ca/software.html), ¢ represents 
the set of adjustable model parameters and the sum is over all the experimental 
data points. 

Analysis of quantitative magnetization exchange data. As described above and 
in the text, methyl °C magnetization exchange experiments were recorded on 
L99A,G113A T4L (1 °C) without and with added benzene using a pulse scheme 
described previously*'*°. Data from Met 102 were analysed because separate, well- 
resolved correlations are obtained for the ground, excited and benzene-bound 
states that could be accurately quantified. At the low temperature used (1 °C), 
the interconversion between ground and excited states as well as benzene binding 
are in the slow exchange regime, a requirement for the magnetization exchange 
experiment. The intensity I of auto- (cross-) peaks corresponding to magnetiza- 
tion that is not (is) transferred between states during a mixing time, Tyy;x, has been 
analysed by numerically solving the Bloch-McConnell equations for the evolution 
of magnetization in the presence of chemical exchange**. The time dependence of 
magnetization during the entire pulse sequence was simulated (details are avail- 
able from the authors on request). The resonance frequencies of the peaks and 
their transverse relaxation rates were obtained from the positions and linewidths 
of the peaks in spectra, respectively. The fitted parameters include the total mag- 
netization, longitudinal relaxation rates (R,) for each state, the fractional popula- 
tions of each state, p; (subject to the constraint that > pi=1), and the rates of 


i 
exchange between states i and j, kexj = kj + kj; The fitting parameters were opti- 
mized using a simplex procedure to minimize the function: 


2 
N p>? = yale 
2 » 1 i 
x= ( Exp 
i=1 GF 


Here the summation is over all the experimental data points, [**? is the experi- 
mental intensity, ©" is the calculated intensity and o®*? is the error in the 
intensity. In the case of two-state exchange (no benzene), 48 data points from 
four peaks associated with Met 102 (two auto- and two cross- peaks X 12 Tyx 
values) were fitted using five fitting parameters. For the three-state exchange 
(approximately 1:1 molar equivalent protein:benzene), 117 data points from nine 
Met 102 peaks (three auto- and six cross-peaks X 13 Tyrx; Fig. 3d) were fitted 
using nine fitting parameters. The minimum error in the intensities was assumed 
to be 3%. Errors in the fitted parameters were estimated using a Monte Carlo 
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procedure’’. Here 50 synthetic data sets were generated using the best-fit para- 
meters in which random error was added to magnetization intensities and '°C/'H 
R; values (based on the experimental errors) and each of the data sets fitted as per 
the experimental data. Errors are calculated as 1 s.d. in the extracted values. 

These experiments clearly indicate that only the ground state binds benzene. As 
shown in Fig. 3d, fits of the time dependencies of diagonal- and cross-peaks from 
magnetization exchange spectra establish that kgp, kup < 0.1 s '. To further sup- 
port the result that benzene does not bind the excited state, Supplementary Fig. 5 
plots the reduced y* obtained from the fit of the magnetization exchange data as a 
function of kexpe = kge + kgp. A clear minimum occurs for kexpr ~ 0 ( Ya =1.1), 
Of note, when kexpe is fixed to the relatively small value of 0.5 s- ', 7,4 increases by 
fivefold (to 5.4), clearly indicating that k..gp is very small. From the principle of 
detailed balance for the equilibrium denoted in Fig. 3e, it is predicted that the ratios 

Pe kee 
pet+pa’ kee 
to within experimental error (Supplementary Table 8). As expected, binding of 
benzene shifts the equilibrium from the excited (binding incompetent) to the 
ground/bound states. These results are in complete agreement with the structure 
of the excited state of L99A T4L, showing clearly that Phe 114 is inserted into the 
cavity, hence obstructing the binding of hydrophobic ligands. 

L99A excited state chemical shifts. Excited state amide nitrogen, amide proton 
and carbonyl chemical shifts are available for nearly all the (164) L99A T4L 
residues (Supplementary Tables 3-5). In cases where the sign of A@gx is obtained, 
the chemical shift of the excited state is readily calculated, O; = @& + A@cer. In 
cases where the sign is not available, the excited state chemical shift is ambiguous 
but restricted to two values, O; = @, + A@ex or O; = DO — ADexz. The mag- 
nitude of the change in chemical shift, |Aagp|, also provides useful information 
(see section on Rosetta calculations below). 

Calculation of L99A TAL excited state structures. The structure of the excited 
state of L99A T4L was obtained by using the CS-Rosetta approach developed for the 
determination of ground state protein structures’, with a number of important 
differences relative to the standard protocol. Based on the A@gmsg values (see text), 
we assumed that only the regions encompassing residues 100-120 and 132-146 
adopt different conformations in the ground and excited states. The structure of 
the rest of the molecule was fixed to the ground state crystal structure of L99A T4L 
(3DMV"*). The structures of these two regions in the excited state were computed 
using the loop modelling application of Rosetta’®. As a control, identical CS-Rosetta 
computations of the ground state structure were also performed using the same 
limited set of ground state shifts (that is, that are available for the excited state). To 
avoid any bias, T4 lysozyme structures were removed from the fragment databases in 
all the computations described here. Two hundred starting 3mer and 9mer fragments 
were selected for each position using the CS-Rossetta approach’, modified to include 
ambiguous excited state shifts in cases where the sign of A@¢x could not be deter- 
mined. Fragments were scored against ambiguous shifts by selecting the shift which 
agreed best with the one predicted for the fragment. Ambiguous chemical shifts were 
similarly taken into account during the scoring of the final structures. The selected 
fragments were then used in a standard Rosetta loop modelling protocol'* to generate 
9,600 structures (the target secondary structure propensities input into Rosetta are 
those predicted by TALOS+”’). Supplementary Fig. 2a and b plots the energies of the 
resultant ground state structures, generated with ground state chemical shifts 
(Supplementary Fig. 2a, CS-Rosetta energy; Supplementary Fig. 2b, chemical shift 
component of the CS-Rosetta energy term) versus r.m.s.d. to the L99A ground state 
X-ray structure. The characteristic funnel shape energy profile so obtained is an 
excellent indicator of convergence and indeed the lowest energy 10 (50) structures 
have pair-wise backbone r.m.s.d. values of 0.6 + 0.15 (0.55 = 0.15) A relative to the 
ground state crystal structure, including only those residues that were allowed to 
move in the calculations. The corresponding plots for the excited state structures are 
shown in Supplementary Fig. 2c and d. 

Some of the low energy CS-Rosetta structures produced with the excited state 
chemical shifts are very similar to the ground state structure (Supplementary Fig. 
2c). This is not surprising, given the experimental finding that the ground state is 
more stable than the excited state by ~2 kcal mol” '. However, the chemical shift 
score by itself clearly indicates that the ground state structure is not a ‘good’ 
solution (Supplementary Fig. 2d). Hence a two-step selection procedure was used. 
Out of the 9,600 structures that were generated initially, 960 structures with the 
best CS-Rosetta score were selected for further analysis. As a second selection step, 
96 of these structures with the best chemical shift score were retained. In this way, 
structures with both low energy CS-Rosetta and low energy chemical shift scores 
are selected. It is noteworthy that the structures so generated were essentially 
identical irrespective of whether selection was first performed on the basis of 
the Rosetta score followed by selection according to chemical shifts (as described 
here) or whether the opposite order of scoring was used. We find that there is a 
major cluster of structures with an r.m.s.d. of ~1.4A to the ground state crystal 


will be independent of ligand (benzene) concentration, as observed 


structure and a much smaller cluster with an r.m.s.d. of ~1 A to the ground state 
conformer; the final 96 structures selected are indicated in green in Supplementary 
Fig. 2c and d. Only one out of the forty lowest energy structures after the second 
round of filtering is part of the second cluster (10% of the 96 structures). A major 
difference between the two sets of structures is the y, angle of Phe 114, with this 
dihedral angle assuming a trans (gauche—) conformation in the major (minor) 
cluster. Quantitative-J experiments” recorded on the L99A,G113A,R119P T4L 
mutant that is an excellent mimic of the excited state (Supplementary Fig. 7, 
Supplementary Table 6, see text) clearly show that the trans conformation is the 
only one populated (Supplementary Fig. 9, Supplementary Table 9). Notably, CS- 
Rosetta computations performed with the full set of chemical shifts for this mutant 
(L99A,G113A,R119P T4L) produced structures that have a conformation that is 
essentially identical to that obtained for the excited state of L99A T4L 
(Supplementary Table 6). All structure calculations were performed on the 
University of Toronto SciNeT super-computer cluster**. Each set of 9,600 struc- 
tures required 10 h of computational time using 256 processor cores. Pymol” and 
Chimera” were used to visualize and analyse the resultant structures. 
Mutations shifting ground and excited state populations. Predictions of free 
energy differences (AAG) were performed as described’. Briefly, single point 
mutants of the 19 amino acids (except cysteine) were made in silico in the region 
corresponding to residues 105-120 and AAG values were computed for repres- 
entative structures of both ground and excited states. The crystal structure of L99A 
T4L (3DMV**) was used as the ground state, with a representative low energy 
structure obtained from CS-Rosetta simulations performed with excited state 
chemical shifts used for the excited state. The free energy difference between 
corresponding point mutants in the excited and ground states was computed 
(AAG, — AAGg; negative values indicate relative stabilization of the excited state). 
We screened for single point mutants that energetically favoured the excited state 
conformation and simultaneously disfavoured the ground state conformation 
(Supplementary Table 7). Mutations for experimental characterization were 
selected according to two criteria: secondary structure propensity of the excited 
state and Rosetta AAG predictions. 
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Protection of repetitive DNA borders from 
self-induced meiotic instability 


Gerben Vader!*, Hannah G. Blitzblau'*, Mihoko A. Tame’, Jill E. Falk'+, Lisa Curtin’? & Andreas Hochwagen't 


DNA double strand breaks (DSBs) in repetitive sequences are a 
potent source of genomic instability, owing to the possibility of 
non-allelic homologous recombination (NAHR). Repetitive 
sequences are especially at risk during meiosis, when numerous 
programmed DSBs are introduced into the genome to initiate meiotic 
recombination’. In the repetitive ribosomal DNA (rDNA) array of 
the budding yeast Saccharomyces cerevisiae, meiotic DSB formation 
is prevented in part through Sir2-dependent heterochromatin 
formation’*. Here we show that the edges of the rDNA array are 
exceptionally susceptible to meiotic DSBs, revealing an inherent 
heterogeneity in the rDNA array. We find that this localized DSB 
susceptibility necessitates a border-specific protection system con- 
sisting of the meiotic ATPase Pch2 and the origin recognition com- 
plex subunit Orcl. Upon disruption of these factors, DSB formation 
and recombination increased specifically in the outermost rDNA 
repeats, leading to NAHR and rDNA instability. Notably, the 
Sir2-dependent heterochromatin of the rDNA itself was responsible 
for the induction of DSBs at the rDNA borders in pch2A cells. Thus, 
although the activity of Sir2 globally prevents meiotic DSBs in the 
rDNA, it creates a highly permissive environment for DSB forma- 
tion at the junctions between heterochromatin and euchromatin. 
Heterochromatinized repetitive DNA arrays are abundant in most 
eukaryotic genomes. Our data define the borders of such chromatin 
domains as distinct high-risk regions for meiotic NAHR, the pro- 
tection of which may be a universal requirement to prevent meiotic 
genome rearrangements that are associated with genomic diseases 
and birth defects. 

To understand better the mechanisms that protect repetitive DNA 
from meiotic NAHR, we analysed the single tandem rDNA array of 
budding yeast. Meiotic DSB formation and recombination in the rDNA 
are repressed by the histone deacetylase Sir2 (refs 2, 3). Additionally, 
Pch2, a widely conserved meiosis-specific ATPase, suppresses meiotic 
recombination in the rDNA by an unknown mechanism**. We used 
clamped-homogenous electric field (CHEF) electrophoresis and 
Southern blotting of excised rDNA arrays to address whether Pch2 
regulates meiotic DSB formation in the rDNA. Consistent with pre- 
vious results~*, the level of full-length rDNA arrays that remained 8h 
after meiotic induction was significantly reduced in sir2A mutants 
compared to wild-type cells, indicating increased DSB formation 
(Fig. la and Supplementary Fig. 1a). By contrast, no such reduction 
occurred in pch2A mutants, although we observed a tenfold increase in 
crossover recombination across the rDNA array (Fig. la, b). Because 
small changes in array length would not be detectable by the CHEF gel 
assay, we wondered whether DSB formation in pch2A mutants 
occurred specifically in the outermost rDNA repeats. To test this 
possibility, we generated pch2A strains carrying a URA3 insertion at 
defined positions in the rDNA array (Fig. 1c) and analysed the rDNA 
repeat units that directly flank these insertions by Southern blotting. 
We observed a strongly DSB-prone site in repeat 1 and weak DSB 


formation in repeat 3, whereas no DSB formation was detectable in 
repeat 10 of the approximately 100 rDNA repeats (Fig. 1d). Thus, 
pch2A cells undergo increased meiotic DSB formation predominantly 
in the outermost rDNA repeats. 

To determine whether PCH2 suppresses DSB formation only within 
the rDNA, or also in other regions of the genome, we first analysed a 
chromosomal fragment spanning the junction between single-copy 
DNA and rDNA ina pch2A mutant by Southern blotting. We observed 
additional, strong DSB formation in the adjacent single-copy sequences 
(Fig. le and Supplementary Fig. 1b), which were previously shown to 
have exceptionally low levels of meiotic DSBs in PCH2 cells” (Fig. 1f). 
The observed break sites behaved similarly to known meiotic DSBs’; 
they were induced during meiosis in dmc1A and DMC1 cells (Fig. 1d, e 
and Supplementary Fig. 1c), depended on the meiotic DSB machinery 
(Supplementary Fig. 1d)’, promoted meiotic recombination (Sup- 
plementary Fig. le) and occurred in gene promoters (Fig. le and 
Supplementary Fig. 1b). Indeed, even the DSBs observed in repeat 1 
mapped to the promoter of a gene (TARI) that is encoded in every 
rDNA repeat’? (Fig. le). Genome-wide analysis of DSBs° in pch2A cells 
showed that strong DSB induction occurred in 30-50-kilobase (kb) 
regions of single-copy sequence abutting both sides of the rDNA 
(Fig. 1f). Increased DSB formation was also observed close to other 
heterochromatic regions (telomeres and HML), whereas the DSB land- 
scape elsewhere in the genome was not markedly altered (Supplemen- 
tary Figs 1f, g, 2 and Supplementary Table 1). In contrast to pch2A 
mutants, the loss of SIR2 did not lead to increased DSB formation 
adjacent to the rDNA array (Fig. 1f). Thus, Pch2 represses recombina- 
tion in the rDNA at the level of DSB formation, but in a manner distinct 
from Sir2. 

We investigated whether the increased DSB formation in the 
outermost rDNA repeats in pch2A mutants (Fig. 1d) resulted in a local 
increase in rDNA recombination. We measured recombination rates 
using flanking markers to the left and right of the rDNA together with 
a collection of single URA3 insertions tiling inwards from the left side 
of the rDNA (Fig. 1c). Analysis of a URA3 insertion in the centre of the 
rDNA (inserted next to repeat 49 of 99) indicated that recombination 
occurred in a symmetrical pattern. Notably, about 80% of the recom- 
bination events in the left half of the rDNA occurred within the first ten 
repeats from the left border (Fig. 1g and Supplementary Table 2), with 
about 30% taking place within repeat 1. Thus, there is a strong bias for 
recombination in the rDNA repeats very close to the array border. 

Because recombination in repetitive DNA can lead to NAHR, we 
selected tetrads of pch2A mutants that had undergone recombination 
in the rDNA, and determined the resulting rDNA repeat number 
between the URA3 insertion and the left rDNA boundary. In 70% 
(n = 47) of the tetrads investigated from different URA3 integrants, 
we detected changes in repeat number, ranging from 1 to 19 repeats 
(Fig. li, j and Supplementary Table 2), demonstrating that rDNA 
crossovers in pch2A cells are frequently associated with NAHR. 
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Figure 1 | Ribosomal-DNA-associated DSB formation and recombination. 
a, CHEF analysis of the rDNA of meiotic dmc1A (H5217), sir2A dmc1A 
(H2953) and pch2A dmc1A (H5216) cells. The schematic shows the analysed 
Xhol restriction fragment. A dmc1A mutation was used to prevent DSB repair. 
The mean (+ s.e.m.) of five experiments is shown. Significance was assessed by 
one-tailed Student’s t-test: dmc1A versus sir2A dmc1A, P value = 0.00122; 
dmc1A versus pch2A dmcl1A, P value = 0.254; pch2A dmc1A versus 

sir2A dmciA, P value = 0.00216. b, Schematic of markers inserted in unique 
single-copy sequences within 500 base pairs (bp) of the rDNA, and crossover 
rates in wild-type (WT; H3026; n = 467) and pch2A (H3027; n = 186) cells. 

c, Schematic indicating marker locations in the rDNA used in d and g. URA3 
markers were inserted in the NTS 1/2 region of the indicated repeats. d, Southern 
blot for restriction fragments containing the indicated insert-associated rDNA 
repeat units from pch2A dmc1A strains H5622 (repeat 1), H5636 (repeat 3) and 
H5706 (repeat 10). DNA was digested with PflMI and Sall and probed for the 
unique rDNA insertion. The YCR047C probe (on DNA digested with HindIII) 
was a positive control for DSB formation. e, Southern blot of the left rDNA flank, 
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including the outermost rDNA repeat, in dmc1A (H5583) and pch2A dmc1A 
(H5622) cells (SalI and Nrul digest; probe YLR152C). Positions of open reading 
frames are shown schematically alongside the Southern blot. f, Enrichment 
profile of ssDNA in chromosome XII in dmc1A (H118, light grey), pch2A dmc1A 
(H2629, black) and sir2A dmc1A (H2953, dark grey) cells. Arrowheads indicate 
>twofold increased DSB formation in pch2A dmc1A compared to dmc1A cells. 
g, Tetrad analysis of pch2A URA3-rDNA-insertion strains H4611 (repeat 1), 
H4613 (repeat 3), H3823 (repeat 10), H4612 (repeat 12), H3820 (repeat 29) and 
H3821 (repeat 49; Supplementary Table 2). Recombination rates between URA3 
and rDNA-flanking markers are shown in relation to the physical URA3 
positions in the rDNA. h, Relative contribution of each measured interval 
indicated in g to total rDNA recombination (percentage of crossovers (CO) per 
kb of interval). i, Incidence of changes in rDNA repeat number between the 
URA3 insertion and the rDNA boundary in pch2A tetrads that had undergone 
crossover recombination. j, CHEF analysis of two tetrads that have undergone 
unequal recombination with parental controls. DNA was digested with XhoI 
and probed with NTS1. Par., parental controls; Rec., recombinants. 
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However, the prevalence of allelic recombinants (30%) and the fact 
that changes in repeat number encompassed less than 20% of the 
approximately 100-110 rDNA repeats in our strains indicate that 
the homology search for DSB repair in the outermost rDNA repeats 
in pch2A cells is restricted to close neighbours. Notably, the distri- 
bution of changes in repeat number closely matched the pattern of 
crossover events (Fig. 1h, i). This congruence indicates that the spread 
in the crossover distribution (Fig. 1g, h) can largely be accounted for by 
non-allelic exchanges between rDNA repeats originating from DSBs in 
the outermost repeats of the array. Finally, although rDNA exchanges 
occurred at a much lower frequency in wild-type cells, they were also 
associated with NAHR (Supplementary Table 2), indicating that Pch2 
primarily acts to prevent NAHR by suppressing DSB formation. These 
results establish the rDNA borders as high-risk regions for meiotic 
NAHR. 

To determine how Pch2 suppresses DSB formation near the rDNA, 
we measured the chromosome association of DSB-related factors at 
the time of DSB formation. The three essential DSB factors’*”” that we 
were able to analyse by chromatin immunoprecipitation (ChIP), 
Recl14, Mer2 and Mrel11, were specifically enriched near the rDNA 
and HML in pch2A cells, mirroring the changes in DSB formation 
(Fig. 2 and Supplementary Fig. 3). We then investigated whether the 
regional exclusion of DSB factors could be explained by local depletion 
of the DSB-promoting chromosome-axis protein Hop], the cytological 
distribution of which is affected by Pch2 (refs 4, 13). Although Hop1 
binding was slightly increased near the rDNA and HML in pch2A cells, 
it was abundant even in wild-type cells, indicating that Pch2 does not 
regulate the initial chromosomal recruitment of Hop]. Rather, the 
differences in Hop1 binding that we observed might reflect an effect 
of Pch2 on chromosome structure’’. Finally, although DSBs are 
enriched in promoters containing histone H3 lysine 4 trimethylation 
(H3K4me3)", we saw no difference in the genome-wide levels of this 
modification with or without Pch2 (Fig. 2c and Supplementary Figs 3d 
and 4), indicating that Pch2 does not influence this chromatin modi- 
fication. These findings indicate that Pch2 specifically blocks the stable 
recruitment of DSB factors to prevent local DSB formation. 

We sought to identify proteins that collaborate with Pch2 in prevent- 
ing rDNA-proximal DSBs. A yeast two-hybrid screen isolated a fragment 
of the Orcl protein, containing its ATPase domain, as a Pch2 interactor 
(Fig. 3a). This interaction was confirmed by co-immunoprecipitation 
(Fig. 3b and Supplementary Fig. 5a). Orcl is a component of the con- 
served origin recognition complex that has several important chromo- 
somal functions, including the loading of the replicative helicase’. 
Impairing Orcl protein levels by a temperature-sensitive orc1-161 muta- 
tion'® (Supplementary Fig. 5b) triggered DSB formation in the rDNA 
flanking regions, similarly to the loss of PCH2 (Fig. 3c, d). DSB formation 
near the rDNA occurred even at a temperature (23 °C) that is permissive 
for pre-meiotic DNA replication and spore viability (Fig. 3c-e and 
Supplementary Fig. 5c, d). Similarly, we saw increased DSB levels near 
the rDNA in an orcl mutant lacking the N-terminal bromo-adjacent 
homology (BAH) domain that is required for the chromatin-silencing 
function of Orcl (ref. 17), but is dispensable for DNA replication (Fig. 3f 
and Supplementary Fig. 5e, f). These data indicate that the regulatory 
roles of Orcl in DSB formation and bulk DNA replication are separable, 
although we cannot rule out that the analysed orc] mutations affect 
rDNA replication locally. During meiosis, Pch2 concentrates in the 
nucleolus, the organelle assembled on the rDNA array*. In orc1-161 
cells, the recruitment of Pch2 to the nucleolus was impaired, despite 
normal levels of cellular Pch2 (Fig. 3g, h). Both Pch2 and Orcl belong 
to the AAA™ family of ATPases that often function as multimeric 
complexes'’, and we found that the ATPase activity of Pch2 was 
required to prevent rDNA-proximal DSBs (Supplementary Fig. 5g). 
These data define a role for Orcl in the nucleolar recruitment and 
possible activation of Pch2 to prevent local DSB formation. 

To find out whether the specific DSB activity at the rDNA borders in 
pch2A mutants was linked to the presence of the rDNA itself, we 
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Figure 2 | Association of the meiotic DSB machinery near the rDNA. 

a, Enrichment of ssDNA in a region flanking the right border of the rDNA on 
chromosome XII (Saccharomyces genome database (SGD) coordinates) in 
dmc1A (H118, grey) and pch2A dmc1A (H26239, black) cells. Arrowheads indicate 
>twofold increased DSB formation in pch2A dmc1A compared to dmciA cells. 
b, ChIP-chip analysis in same region as in a, for the proteins Rec114—13Myc (first 
panel; wild type (H4890), pch2A (H4893)), Mer2-5Myc (second panel; wild type 
(H5917), pch2A (H5916)), Mrel1-13Myc (third panel; wild type (H5547), pch2A 
(H5947)) and Hop] (fourth panel; wild type (H119), pch2A (H2817)), in wild- 
type (grey) and pch2A (black) cells. c, Average enrichment for the different PCH2 
and pch2A ssDNA and ChIP data sets (see a, b and Supplementary Fig. 4) within 
the 50 kb flanking the right rDNA border (see Methods for genomic coordinates). 
The genome-wide average is indicated by the dotted line. 


deleted the rDNA array from its genomic location. In these strains, 
loss of PCH2 no longer allowed DSB formation in the flanking regions 
(Figs 4a, b), demonstrating an intrinsic DSB-promoting activity in the 
rDNA. To investigate whether the rDNA was sufficient to promote 
DSB formation, we created a translocation between chromosomes II 
and XII that exchanged the rDNA-distal portion of chromosome XII 
with a portion of chromosome II (Fig. 4c). In these strains, DSB levels 
were no longer increased in the former right flank of the rDNA (now 
flanked by chromosome II sequences; Fig. 4d), but notably, increased 
DSB formation was observed on the chromosome II sequences that, 
after translocation, were flanking the rDNA (Fig. 4e). Thus, in the 
absence of PCH2, the rDNA is necessary and sufficient to promote 
DSB formation. 


1 SEPTEMBER 2011 | VOL 477 | NATURE | 117 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


a b Input Anti-HA IP 
ech = += + 3HA-PCH2 
1 TPSe 564 ry 
235 Orct BS. on: 
| BAHT ATPase EI 914 RL rere 


#1i____ 250-601 —_ L pra 
42 “265-775 , 
Yeast-two-hybrid clones | -— Histone H3 
dmci1A orc1-161 dmc1A 
23°C 30°C 33°C 23°C 30°C 33°C 


- ' 


Probe: YLR164W 


Probe: YCRO47C 


orc1-161 
dmc1A dmciA 
d rDNA 23°C 23°C Time 
= 
€8 | 2 orc1-161 dmctA (23 °C) 
ao dmc1A (23 °C) 
A = 4 sy s 
2 > Ass Tere we ee 
oO 
480 520 560 600 640 ’ UN 
Chromosome XII position (kb) TE, 
2C4C 2C4C 
f g Pch2 Nop1 DNA Merge 
ORC1 orc1-Abah 
dmc1A dmciA Wild type 
03580358Time (23°C) 
(h) 
e orc1-161 
(23 °C) 
~ h_ Wild type orc1-161 
(23 °C) (23°C) Time 
Probe:YLR164W O82 3 0 2s ih) 
| a ae - SS 


Probe:YCR047C a Pok 
Figure 3 | Orcl and Pch2 collaborate to suppress DSB formation. 

a, Schematic of Pch2 and Orcl proteins, indicating Orcl clones identified by 
yeast two-hybrid screen. b, Co-immunoprecipitation (IP) between 
haemagglutinin-tagged Pch2 (3HA-Pch2) and Orcl in wild-type (H119) and 
3HA-PCH2 (H3463) cells. Fpr3 and histone H3 are controls for the nucleolar 
and chromosomal fractions, respectively. c, Southern blots of the right rDNA 
flank and YCRO47C in meiotic dmc1A (H118) and orcl-161 dmc1A (H4952) 
cells, grown at the indicated temperatures. Arrowheads indicate broken DNA 
fragments. d, Profiles of ssDNA in the region flanking the right rDNA border in 
orcl-161 dmc1A (H5137, black) and dmc1A (H118, grey) cells, grown at 23 °C. 
e, DNA-content analysis of meiotic dmc1A (H118) and orcl-161 dmc1A 
(H5137) cells grown at 23 °C. 2C and 4C refer to unreplicated and replicated 
diploid DNA contents, respectively. f, Southern blots of the right rDNA flank 
and YCR047C in ORCI dmc1A (H5838) and orcl-Abah dmc1A (H5865) cells. 
g, Immunofluorescence of chromosome spreads stained for Pch2 (HA, green), 
Nop! (nucleolar marker, red) and DNA (blue) in 3HA-PCH2 (H3463) and 
3HA-PCH2 orc1-161 (H5033) cells at 3 h after meiotic induction at 23 °C. Scale 
bar, 2 um. h, Western blot analysis showing Pch2 (HA) expression in 3HA- 
PCH2 (H3463) and 3HA-PCH2 orc1-161 (H5033) cells. Pgk1 is used as loading 
control. 


The rDNA is assembled into specialized, Sir2-dependent chro- 
matin, and we asked how this chromatin state influenced DSB forma- 
tion at the rDNA boundaries. Notably, loss of Sir2 protein or loss of 
its deacetylase activity’? largely eliminated DSB formation in the 
rDNA flanking regions in pch2A mutants (Fig. 4f, g). Thus, although 
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Figure 4 | Ribosomal-DNA chromatin promotes DSB formation. 

a, Schematic of the rDNA deletion strain (rdnAA) and ssDNA profiles of the 
region flanking the right rDNA border in pch2A dmc1A (H2629, black) and 
pch2A rdnAA dmclA (H4737, grey) cells. b, Southern blot analysis of the right 
rDNA flank in dmc1A (H118), pch2A dmc1A (H2629), rdnAA dmc1A (H4736) 
and pch2A rdnAA dmc1A (H4737) cells. Arrowheads indicate broken DNA 
fragments. c, Strategy used to generate chromosomal translocations between 
chromosomes XII and II. d, e, Profiles of ssDNA in strains containing the XILII 
translocation. In d, the depicted region is next to the rDNA in pch2A dmc1A 
(H2629) cells (dark blue) and next to the left arm of chromosome II in 
peh2A t(ILX1I) dmc1A (H4798) cells (light blue). In e, the depicted region is 
located on chromosome II in pch2A dmcl1A cells (orange) and next to rDNA in 
peh2A t(XILI) dmc1A cells (dark red). f, Southern blot of the left rDNA flank 
(HindIII digest; ARS1216 probe) and of YCRO47C in dmc1A (H118), 

pch2A dmciA (H2629), sir2A dmc1A (H2953) and pch2A sir2A dmc1A 
(H3038) cells. Positions of open reading frames are shown schematically 
alongside the Southern blot. g, Southern blot of the right rDNA flank and 
YCRO47C in pch2A sir2A dmc1A (H3262), pch2A sir2A dmc1A leu2::SIR2 
(H3261) and pceh2A sir2A dmc1A leu2::sir2-345 (H3282) cells. Arrowheads 
indicate broken DNA fragments. 


Sir2-dependent heterochromatin suppresses meiotic DSBs within the 
rDNA array (refs 2, 3 and Fig. 1a), it has a profound DSB-promoting 
effect on the rDNA borders that is counteracted by Pch2 and Orcl. It is 
also notable that Sir2 itself localizes Pch2 to the nucleolus’, reflecting 
an elegant coupling mechanism that maintains meiotic stability across 
the entire rDNA. The double dependence of Pch2 on Sir2 and Orcl 
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may promote Pch2 enrichment at the nucleolus, analogous to the 
bimodal recruitment mechanisms that restrict localization of Aurora 
B and shugoshin to centromeres”. 

Although repeat-associated chromatin marks differ substantially 
between organisms and even among individual loci”’, the assembly of 
heterochromatin on repetitive DNA arrays is a common strategy to 
protect the genome against destabilization caused by errors in meiotic 
recombination’. Our results establish borders between heterochromatin 
and euchromatin as potential high-risk regions for meiotic DSB forma- 
tion and NAHR, and reveal the existence of a secondary border-specific 
system that shields against these events. Buffer zones like those estab- 
lished by Pch2 and Orcl may need to be broad, because even DSBs 
adjacent to repetitive DNA can trigger NAHR™. Given the prominent 
presence of repetitive DNA arrays in genomes ranging from yeast to 
man’, we propose that mechanisms that limit DSB activity around 
repetitive DNA might be a widespread phenomenon. 


METHODS SUMMARY 


Yeast strains were of the SK1 background and are listed in Supplementary Table 3. 
Analysis of single-stranded DNA profiles and ChIP-chip analysis were performed 
as previously described*”*. These and other standard techniques used are detailed 
in the Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Yeast strains and two-hybrid analysis. All yeast strains used in this study were 
constructed in the SK1 background and are listed in Supplementary Table 3. 
Epitope tags and gene disruptions were introduced by standard PCR-based trans- 
formation. The orcl-Abah mutant was generated using a plasmid encoding a 
truncated version of Orcl (amino acids 235-914; pSPB1.48, gift from S. P. 
Bell**). To create URA3 insertions in the rDNA, cells were transformed with a 
pRS306-NTS1/2 plasmid linearized with Sphl (this plasmid contains a 2341-bp 
fragment harbouring the intergenic rDNA sequences, NTS1 and NTS2, ligated into 
the BamHI and EcoRI sites of pRS306). Insertion sites in the rDNA were mapped 
by CHEF gel analysis using a unique Xhol site in the inserted sequence (XhoI does 
not cut in the rDNA), and suitable clones were selected for further analysis. SK1 
strains lacking the rDNA array were generated as described in ref. 25. Briefly, cells 
were transformed with a very high-copy rDNA plasmid (pRDN-hyg:: URA3::leu2-8) 
carrying a recessive point mutation that confers resistance to hygromycin”’. After 
selection on hygromycin, a clone was selected that had lost all but three repeats of 
the rDNA array through spontaneous deletion, as determined by CHEF gel analysis. 
The remaining rDNA copies were subsequently deleted by conventional gene 
disruption using a HIS3 deletion cassette. Complete deletion of the rDNA array 
was confirmed by Southern blotting. Chromosomal translocations between chro- 
mosomes XII and II were generated essentially as previously described”. Briefly, 
plasmids containing a promoter-less ADE2 gene adjacent to a loxP site (loxP- 
ADE2::natMX4) and a GPD promoter (pGPD) with an adjacent loxP site 
(pGPD-loxP::hphMX4; both plasmids were gifts from N. Hunter) were integrated 
at YLR162W-A and LYS2, respectively. After induction of Cre recombinase from a 
pGAL-Cre plasmid (N. Hunter), cells were selected that had undergone transloca- 
tion between LYS2 (chromosome II) and YLRI62W-A (chromosome XII). 
Translocation was confirmed by Southern blot analysis. To identify interactors 
of Pch2 by two-hybrid screen, full-length PCH2 was amplified from genomic DNA 
and the intron was removed by site-directed mutagenesis. The Pch2 coding 
sequence was then cloned into pGBDU-C1, and the resulting bait plasmid was 
used to screen libraries in all three reading frames”. 

Synchronous meiosis. Cells were grown for 24h in yeast peptone dextrose (YPD) 
at 23°C then diluted in BYTA medium (1% yeast extract, 2% tryptone, 1% 
potassium acetate, 50mM potassium phthalate) to a optical density at 600 nm 
(OD609) of 0.3 (OD¢09 = 0.5 for orcl-161, rdnAA and XII; translocation strains), 
and grown for 16h at 30°C (or for 18h at 23°C in the case of temperature- 
sensitive strains). After two washes in water, cells were diluted into SPO medium 
(0.3% potassium acetate) at OD¢o9 = 1.9 and incubated at 30°C unless otherwise 
stated. 

Isolation of ssDNA. For ssDNA analysis”’”°, about 10° cells were fixed in 70% 
ethanol at —20 °C at 0 hand 5h after induction of meiosis. After spheroplasting in 
sorbitol buffer (1 M sorbitol, 1% B-mercaptoethanol, 0.2 mgml' zymolyase, and 
0.1M EDTA, pH 7.4), cells were lysed in NDS buffer (0.6% SDS, 300 mM EDTA, 
10 mM Tris-HCl, pH 9.5). After treatment with proteinase K (0.25 mg ml ~ 1) and 
RNase A, the DNA was digested with EcoRI and ssDNA was then enriched by 
adsorption to BND-cellulose and eluted using 1.8% caffeine. This enriched ssDNA 
was subsequently used for microarray analysis. For this, 1.5 1g of the respective 0-h 
and 5-h ssDNA samples was labelled with Cy3-dUTP or Cy5-dUTP (GE 
Healthcare) by random priming without denaturation using 4 jug of random primer 
(Integrated DNA Technologies) and 10 units of Klenow (New England Biolabs). 
Western blotting and immunoprecipitation. For western blotting, 5 ml of meiotic 
cells were harvested at the indicated time points and resuspended in 5% trichloro- 
acetic acid. After incubation on ice for 10 min, samples were washed in acetone and 
dried overnight. Samples were lysed by bead beating in a FastPrep FP120 (Thermo 
Scientific) in TE lysis buffer (10 mM Tris (pH 7.5), 1mM EDTA, 2.75 mM dithio- 
threitol). SDS loading buffer (3X) was added and the pH of the sample was adjusted 
to neutral by addition of 1M Tris (pH 8.0). Samples were separated by standard 
polyacrylamide gel electrophoresis. The following antibodies were used: anti-ORC 
(1108, 1:1,000, gift from S. P. Bell), anti-Fpr3 (1:1,000, gift from J. Thorner), anti-HA 
(3F10, 1:1,000, Roche), anti-Pgk1 (1:1,000, Invitrogen) and anti-histone H3 (1:1,000, 
Abcam). For immunoprecipitations, 50 ml of meiotic cells were harvested 3 h after 
induction of meiosis. Cells were diluted in 2X lysis buffer (20 mM HEPES (pH 7.5), 
4mM MgCl, 0.6 M glutamic acid, 0.32 M sorbitol, 4% glycerol, 0.5% Triton X-100) 
containing protease inhibitors, and lysed by bead beating. Extracts were sonicated 
and cleared by centrifugation. After removal of one tenth of the extract for an input 
sample, extracts were immunoprecipitated with 2 pl anti-HA (3F10, Roche) for 
3HA-Pch2, 2 pl anti-ORC (1108) and 2 ll anti-Fpr3, in combination with 20 ll of 
a50% slurry of GammaBind-Sepharose beads (GE Healthcare) for 16 h at 4 °C. After 
five washes in 1X lysis buffer, 1 SDS loading buffer was added to the beads, and 
samples were analysed by western blotting with the indicated antibodies. 
Chromatin immunoprecipitation. Meiotic cells (25 ml) were harvested 3 h after 
induction of meiosis and fixed for 15 min in 1% formaldehyde. The formaldehyde 


was quenched by addition of 125mM glycine. Samples were processed as previ- 
ously described”*. Before immunoprecipitation, one tenth of the sample was 
removed as input sample. The antibodies used for immunoprecipitation were: 
2 ul anti-Myc (9E11, Abcam; for Recl14-13Myc, Mer2-5Myc and Mrell- 
13Myc), 2ul anti-Hopl (gift from N. Hollingsworth), 2] anti-histone-H3 
(AB1791, Abcam) and 2 wl anti-H3K4me3 (AB8580, Abcam), in combination 
with 20 pl of a 50% slurry of GammaBind-Sepharose beads (GE Healthcare). 
Half of the ChIP sample and one tenth of the input were labelled with Cy3- 
dUTP or Cy5-dUTP (GE Healthcare) as described in the ssDNA protocol, with 
the difference that the DNA was denatured for 5 min at 95 °C before the extension 
reaction. 

Microarray analysis. After removal of unincorporated dyes, Cy3- and Cy5- 
labelled samples were hybridized to custom 4 X 44K tiled genomic yeast micro- 
arrays (Agilent Technologies) for 16h at 65°C. Levels of Cy3 and Cy5 were 
calculated with the Agilent Feature Extractor CGH software. Background normal- 
ization, log, ratios for each experiment and scale normalizations between experi- 
ments were calculated with the sma package in R (v2.1.0, http://www.r-project. 
org)”’”°. Each data set is an average of two experiments. For comparison between 
isogenic wild-type and pch2A cells, data sets were scale-normalized. To analyse the 
distribution of H3K4me3, we normalized the enrichment to that of total histone 
H3 generated from the same extracts, by subtracting the log, ratios. To measure 
the average ssDNA or ChIP enrichment in different chromosomal regions, the 
following SGD coordinates were analysed, on the basis of the positions of available 
array features: 

50 kb right of the rDNA: XII, 490,531-540,530. 

50 kb left of the rDNA: XII, 401,371-451,370. 

Rest of chromosome XII: XII, 1-401,370 and 540,531-1,078,177. 

First 100 kb of chromosome III: II, 1-100,000. 

Rest of chromosome III: III, 100,001-316,620. 

Chromosome VIII: VII, 1-562,643. 

Chromosome spreads and immunofluorescence. Meiotic cells were spread as 
described previously*". Cells were spheroplasted at 37 °C in solution 1 (2% potassium 
acetate, 0.8% sorbitol, 10mM dithiothreitol, 130mg ml! zymolyase 100T 
(Seikagaku)). Solution 2 (100mM MES (pH6.4), 1mM EDTA, 0.5mM MgCh, 
1M sorbitol) was added to stop spheroplasting. Spheroplasted cells (15 pil) were 
fixed with 30 ul of fixative solution (4% paraformaldehyde, 3.4% sucrose) and lysed 
with 60 p11 1% lipsol. After addition of 60 1] fixative solution, cells were spread using a 
glass rod. After drying, the slides were blocked in blocking buffer (0.2% gelatine, 0.5% 
BSA in PBS) and stained with the following antibodies: anti-HA (3F10, 1:500 
dilution, Roche), and anti-Nop1 (1:500 dilution, Encor Biotechnology). 

CHEF gel electrophoresis and Southern blotting. Chromosome fragments for 
CHEF analysis were prepared by restriction digest in agarose plugs. Briefly, 20 ml 
of meiotic cells were killed by addition of sodium azide (0.1% final w/v), pelleted 
and stored on ice for the duration of the time course. Cell pellets were washed twice 
in CHEF-TE (10mM Tris-HCl (pH 7.5), 50mM EDTA) and resuspended in 
300 pl CHEF-TE. Tubes were individually treated as follows: 4 ul zymolyase 
T100 (10 mg ml ') was added and the mix was incubated at 42 °C for 30s before 
addition of 500 ul low-melting-point agarose (1.5% SeaPlaque GTG, 125mM 
EDTA) at 42°C. Gel plugs (90 tl) were allowed to harden on ice in disposable 
plug molds (Bio-Rad) and incubated overnight at 37 °C in 300 pil LET (10 mM Tris 
(pH 7.5), 500 mM EDTA) per plug. Plugs were deproteinized overnight at 50 °C in 
200 pl NDS-PK (LET, 1% N-lauroylsarcosine, 1 mg ml~ 7 proteinase K (Amresco)) 
per plug. Proteinase K was inactivated by incubating plugs for 1 h at 4 °C in CHEF- 
TE containing 1 mM PMSF, and plugs were washed three times in CHEF-TE, then 
digested with Xhol in digestion buffer containing 5 mM spermidine. To analyse 
the entire rDNA array, digested chromosomes were separated by CHEF gel elec- 
trophoresis in 1% agarose in 0.5X TBE, 6 V cm}, using 60-s pulses for 15h and 
90-s pulses for 9 h. For fine mapping of rDNA insertions and to analyse changes in 
repeat number, digested chromosome fragments were separated using a 5-20s 
ramp over 20h. For conventional electrophoresis, DNA fragments were separated 
on 0.6% agarose in 1X TBE and transferred onto Hybond-XL membranes (GE 
Healthcare) by alkaline transfer. Southern blotting was performed as previously 
described* and quantified with a Fujifilm BAS-2500 image reader V1.8 and Multi 
Gauge V2.2 software. 

Probes for Southern analysis. Probe templates for non-rDNA sequences were 
generated by nested PCR and gel purification. The following probes (SGD coor- 
dinates) were used: 

YLR164W: XII, 493,432-493,932. 

YCRO47C: III, 209,361-210,030. 

ARS1216: XU, 450,407-451,150. 

YLR152C: XII, 443,849-444,9 10. 

NTS1: Xhol and Xbal digest of pRS306-NTS1/2. This probe detects all NTS1 and 
NTS2 sequences (pan-rDNA probe). 


©2011 Macmillan Publishers Limited. All rights reserved 


rDNA insertion: BciVI digest of pRS306-NTS1/2. This probe specifically detects 
the plasmid backbone of the URA3 insertion cassette. 

TOM1/YDR457W: IV: 1,370,714-1,371,733 

Flow cytometry. At the indicated time points, 150 ll of meiotic cells were fixed for 
2h at 4°C after addition of 350 ul absolute ethanol. Cells were resuspended in 
500 pl of 50 mM sodium citrate containing 0.7 il RNase A (30 mg ml’, Sigma). 
Cells were incubated for 2 h at 50 °C. 10 pl proteinase K (Amresco) was added and 
cells were deproteinated for 2 h at 50°C. 500 ul of 50 mM sodium citrate contain- 
ing 0.2 ul Sytox Green (Amersham) was added to the cells. Cells were briefly 
sonicated and analysed using a FACScalibur (Becton Dickinson) flow cytometer. 
DNA profiles were generated using CellQuest software. 

Recombination mapping. To determine crossover recombination rates, cells 
were sporulated for 24h in 3 ml SPO and treated with zymolyase (1 mg ml”! in 
1 M sorbitol) to remove ascus walls. Tetrads were dissected by micromanipulation 
and marker segregation was determined by replica plating on appropriate selective 
media. For mapping within the rDNA, only tetratypes were used to calculate 
recombination rates, to avoid distortions originating from non-parental ditypes 
that were probably the result of previous mitotic recombination. 
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CORRECTIONS & AMENDMENTS 


ADDENDUM 
doi:10.1038/nature10396 


Post-traumatic stress disorder is associated with PACAP and the PACI1 


receptor 


Kerry J. Ressler, Kristina B. Mercer, Bekh Bradley, Tanja Jovanovic, Amy Mahan, Kimberly Kerley, Seth D. Norrholm, 
Varun Kilaru, Alicia K. Smith, Amanda J. Myers, Manuel Ramirez, Anzhelika Engel, Sayamwong E. Hammack, Donna Toufexis, 


Karen M. Braas, Elisabeth B. Binder & Victor May 


Nature 470, 492-497 (2011) 


Half of the data points were inadvertently omitted from the published 
version of Fig. 4a; the statistical analyses in the text and figure legend, 
however, do refer to the complete data set. The corrected figure is shown 
here and has been corrected in the online versions of the paper. 

In addition, we present additional information to clarify two results 
reported in the Article regarding plasma pituitary adenylate cyclase- 
activating polypeptide (PACAP) levels and post-traumatic stress 
disorder (PTSD) symptom associations. In the Article, we reported 
replication of the association between PACAP levels and hyperarou- 
sal subscale, because this was the most robust association in the initial 
cohort. We now present the analyses separately for initial, replication 
and combined cohorts in Table 1. All associations but one are signifi- 
cant in the replication cohort. The second issue concerns potential 
medical confounds that could underlie the reported association. 
Although we do not have medical chart data on most patients, we 
do have responses from a health questionnaire administered during 
collection of trauma history and other data. We have now reanalysed 
the associations for the PTSD symptom scale (PSS) hyperarousal 
and total symptoms using subjective reports of health condition 


Table 1 | PACAP associations with PTSD symptoms 


» 
“y 


PTSD symptom scale 


0 
0 0.02 0.04 0.06 0.08 0.1 
ADCYAP1R1 
methylation (8 value) 


from the questionnaires as covariates. These data are presented in 
Table 2 and do not show any effect of health- and illness-related 
questions on the relationship between PACAP and PTSD symptoms. 
None of these additions affect the results and conclusions of the 
original Article. 


1. Diagnostic and Statistical Manual of Mental Disorders 4th edn (text rev.) (American 
Psychiatric Association, 2000). 


Initial Replication Combined 
n=34 n=74 n=108 
**Correlation of PACAP level with PSS hyperarousal score 0.006 0.015 0.001 
*High PACAP level with PSS hyperarousal score (Fig. 1c) 0.001 0.01 0.0004 
*High PACAP level with PSS hyperarousal score (Fig. 1d, adjusted) 0.01 0.001 0.00005 
*High PACAP level with PSS total score (Fig. 1b) 0.0003 0.15 0.008 
*High PACAP level with PSS total score (adjusted) 0.004 0.04 0.002 
*High PACAP level with clinically significant PTSD symptoms (Fig. 1d, adjusted) 0.0002 0.01 0.0003 
*High PACAP levels with PSS-based PTSD diagnosis! (adjusted) 0.01 0.02 0.0008 
The table shows P values of correlations and regression analyses for the initial, replication and combined cohort analyses: 2-tailed for initial and combined; 1-tailed for replication. 


‘Adjusted’ means adjusted for age, substance use and trauma. 
** Bivariate correlation. 
* Analysis of variance (ANOVA). 


Table 2 | Health-adjusted PACAP associations 


Initial Replication Combined 

n=28 n=58 n=86 
Health adjusted-association of high PACAP levels with PSS hyperarousal score 0.031 0.005 0.001 
Health adjusted-association of high PACAP levels with PSS-based PTSD diagnosis! 0.05 0.03 0.002 


The table shows P values of correlations and regression for the initial, replication and combined cohort analyses: 2-tailed analyses for initial and combined; 1-tailed for replication. 
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The opening of the Scripps Research Institute’s campus in Jupiter, Florida, was the start of an influx of bioscience to the state — a flow that has now slowed. 


Second thoughts 


The sunshine state’s rush to become a bioscience player 
started with a bang. Now it faces more realistic expectations. 


BY SARAH KELLOGG 


to transform the state into a bioscience 

powerhouse, hoping that it would 
become known for more than just beach- 
front retirement apartments and cartoon- 
themed amusement parks. So they launched 
a bold initiative to swiftly build world-class 
bioscience clusters in the region. The plan has 
cost the state and local governments more than 
US$1.5 billion, sparked significant growth in 
Florida's once-sleepy biotechnology industry 
—and left many expecting more. The promise 
of a robust industry with new jobs, research 
funding, patentable discoveries and royalties 
has yet to be fully realized. That is frustrating 
for a state grappling with a lacklustre economy 
and severe budget shortfalls. 


B« years ago, Florida’ officials wanted 


Nevertheless, Florida’s bioscience invest- 
ment has brought in some recruitment, fed- 
eral funding, public-private partnerships and 
start-ups, albeit more slowly than many would 
like. “Florida was not much on the radar of the 
international scientific community when all this 
started,” says Claudia Hillinger, vice-president 
for institute development at the Max Planck 
Florida Institute in Jupiter, a brain-research 
centre and the first US campus of the German 
Max Planck Society. “That has changed in the 
last three years. Things are starting to develop 
and evolve, but I think we're going to need a bit 
more patience from everyone,’ she adds. 


AN AMBITIOUS START 

When Florida launched its effort to promote 
bioscience in 2003, the US economy was thriv- 
ing. Then-Governor Jeb Bush (Republican) 


was on a personal mission to restructure 
Florida's financial landscape for the twenty- 
first century, and the state was flush with pri- 
vate and public funds. 

The initiative — comprising generous 
government subsidies and aggressive recruit- 
ment — proceeded quickly. Life-science clus- 
ters, mostly in central and southern Florida, 
emerged in cities commonly associated with 
orange groves and sandy beaches (see Nature 
446, 1112-1113; 2007). Respected senior 
researchers and promising postdocs headed 
to the region, their ambitions limited only by 
their imaginations. State and local incentives, 
including direct grants and tax breaks, drew 
prestigious research institutions. The Scripps 
Research Institute, a biomedical-research 
centre headquartered in La Jolla, California, 
received $579 million in combined subsidies 
to open Scripps Florida in Jupiter; the Sanford- 
Burnham Institute for Medical Research, also 
headquartered in La Jolla, drew $310 million to 
open a centre in Orlando; and the Max Planck 
Society got $188 million for its Florida campus. 
Almost overnight, Florida shot from its bio- 
science infancy to adolescence. 

The Battelle/BIO State Bioscience Initia- 
tives 2010 report from Battelle, an independ- 
ent research firm based in Columbus, Ohio, 
showcases Florida's good early track record. | 
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> The state ranks 6th in the country for the 
number of bioscience jobs held in 2008, with 
27,960; and 17th in terms of US National Insti- 
tutes of Health grants received in 2009, with 
$466 million. Florida's biotech jobs pay well, 
with an average salary of $55,264 in 2008, 
compared with $39,596 for private-sector jobs 
overall in the state, the report found. 

Scripps Florida was the first organization to 
come on board, with temporary labs opening 
in 2005 and a dedicated campus launching in 
2009. It showed that Florida was serious about 
bioscience, convinced other institutes to follow 
its example, and spurred interest in innovative 
partnerships with private companies hoping to 
exploit translational research. It currently has 
40 principal investigators and 450 staff mem- 
bers, and plans to increase those numbers to 
60 faculty members and 545 staff by 2014. It 
is seeking researchers with expertise in cancer 
biology, metabolism and ageing, and molecu- 
lar therapeutics, among others. “Since we've 
been here, $200 million of federal money has 
come to us and the state,” says Harry Orf, vice- 
president of scientific operations for Scripps 
Florida. “This is entirely new money coming 
into the state, and we continue to secure more.” 


HOBBLED BY THE ECONOMY 
Sustaining such intense growth is difficult. 
Many of the public officials who championed 
biotech investment have left office; federal 
research funding has diminished; and Florida's 
government is strapped for cash. The state is 
keeping its early financial commitments to 
institutes, but other investments have slowed. 
In 2010, the Florida legislature’s Office of 
Program Policy Analysis and Government 
Accountability released a report — Biotechnol- 
ogy Clusters Developing Slowly — examining 
the sector. It noted that in 2008, 57% of Florida's 
biotech employment was in the counties host- 
ing the research institutes, which have prom- 
ised the state at least 1,100 highly paid research 
positions. But the report pointed out that the 
investment “has not yet resulted in the growth 
of technology clusters in the counties where 
program grantees have established facilities”. 
These conclusions reflect the ravages of high 
expectations and a floundering economy. “I 
think were looking at a different strategy going 
forward,’ says Russell Allen, president and chief 
executive of BioFlorida, the state’s bioscience 
trade association. “I don’t anticipate we'll see 
hundreds of millions of incentive dollars any 
time soon. I’m not sure that would be our next 
best step anyway. We've proven we can build 
aresearch base. Now it’s about creating jobs.” 
In May, Florida lawmakers approved a 
2011-12 budget that allocated $3.48 billion to 
the state’s 11 public universities — a 4% drop 
from the 2010-11 budget. This is the fourth 
year in a row that universities have seen a cut 
in state funding, and they have had to imple- 
ment hiring freezes, tuition-fee increases and 
budget cuts. This year, to further trim state 


spending, Republican Governor Rick Scott 
ordered cuts of a total of $615 million from 
the overall proposed budget, including $6 mil- 
lion to build a University of Florida research 
facility in Lake Nona, $6 million to complete 
the applied science building at Florida State 
University in Tallahassee, and $6.3 million to 
build the University of Central Florida’s inter- 
disciplinary research and incubator facility in 
Orlando. He also vetoed $2 million in new 
money for obesity and diabetes research at 
the Sanford-Burnham Institute. 

“From a funding perspective, we all wish 
things were different,’ says Ryan West, direc- 
tor of talent and economic development at 
the Florida Chamber of Commerce. “I heard 
no legislator take any pride or joy in reducing 
funding levels for any of these institutions or 
reducing the investment overall in the bio- 
sciences. Nobody is happy about it” 

Funding anxieties 
seem to have contrib- 
uted to the collapse of 
a deal in June, when, 
owing to a lack of 
state investment, the 
Jackson Laboratory 
genetic-research firm 
in Bar Harbor, Maine, 
withdrew a bid to 
build a lab in Sarasota 


“Things are County. “We were 
starting to invited to submit a 
develop and much-reduced pro- 
evolve, but I posal to the [state], 
think we are but the amount 
going toneed available ... and the 
a bit more uncertainty of future 


funding, made such 
a venture too specu- 
lative to undertake,” 
says Charles Hewett, Jackson Lab’s executive 
vice-president, in a written statement. 

To encourage the private sector, the state is 
looking beyond direct financial support to reg- 
ulatory changes and tax incentives that could 
foster biotech growth. In early 2011, lawmakers 
approved an annual corporate income-tax 
credit, similar to an extant federal programme, 
for companies that invest in research and 
development in Florida. “Growing this sector 
occurs from a lot more than providing financial 
support,’ says Stuart Doyle, a spokesman for 
Enterprise Florida in Orlando, the state’s main 
economic-development agency. 


patience.” 
Claudia Hillinger 


A DEARTH OF VENTURE CAPITAL 

The biggest potential obstacle to sustained 
long-term growth in Florida biotechnology is 
a paucity of in-state venture-capital firms will- 
ing to invest in biotech start-ups. “The state 
works very hard to help companies like ours, 
but you're just not going to find the savvy level 
of investor you find in California or New York,’ 
says Marilyn Bruno, founder and chief execu- 
tive of Aequor, a start-up biotech company 
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in Coral Gables. Bruno is seeking investors 
to develop environmentally friendly com- 
pounds that halt bacterial contamination on 
a wide variety of surfaces, ranging from boat 
hulls to skin. “Why should they invest in a risky 
biotech start-up they don't understand if they 
can buy condos on Miami Beach and make a 
mint?” she asks. 

Florida’s State Board of Administration has 
been authorized to invest up to 1.5% of the net 
assets of the state retirement-system trust fund 
in technology and high-growth investments. 
And in 2007, the state set aside $29.5 million 
for the Florida Opportunity Fund, created to 
underwrite in-state venture-capital firms. It 
has committed most of its resources to com- 
panies and hedge funds that support informa- 
tion technology, homeland security, defence 
and biotech start-ups in the state. But allowing 
the money to be used so broadly has diluted 
the fund’s influence, says Bruno. 


PORTENDING PROMISE 

Despite the obstacles, Florida's bioscience 
efforts are bearing fruit. A cluster is growing in 
Orlando, where the Sanford-Burnham Insti- 
tute opened an $85-million building in May; 
it anchors the Lake Nona Medical City, includ- 
ing the University of Central Florida College of 
Medicine, Nemours Children’s Hospital and the 
Veterans Administration Medical Center, both 
slated to open in 2012. Once completed, the 
Medical City is expected to employ some 30,000 
scientists, administrative staff and others. 

“If you simply look at Sanford-Burnham 
employees, about 300 people, that doesn’t 
seem like many jobs,’ says Daniel Kelly, sci- 
entific director for the institute. But “the point 
is we're part of the engine that runs this entire 
cluster. It’s important for credibility,” he says. 
“It’s important for investment.” 

Cooperation between Florida universi- 
ties and institutes has helped to draw recruits 
from more established entities, says David 
Fitzpatrick, chief executive and scientific 
director of Max Planck Florida, who left Duke 
University in Durham, North Carolina, for his 
Florida post in January this year. “It wasn't easy 
for me, leaving Duke,’ says Fitzpatrick. “But this 
was an opportunity to set up an institute and to 
take it in a direction that I think is very exciting” 
Max Planck Florida expects to employ a total of 
150 staff members by 2015, with 12 directors 
and research-group leaders, 114 other scientific 
staff members and 24 in administration. 

Ultimately, the newness of Florida’s bio- 
sciences sector is both its greatest selling point 
and a notable hurdle, say observers. Regard- 
less of early and continuing growth, Florida's 
bioscience clusters are untested commodities, 
and it will take years to determine whether the 
state has turned a strategic investment into a 
flourishing industry. m 


Sarah Kellogg is a freelance writer in 
Washington DC. 
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The magical world 
of data sharing 


Andrew Peterman says scientists need to reach out. 


ackstage at Disneyland, the scene is 
B not pretty. Think of Bambi’s mother 

dying or Simba’s father being killed 
in the wildebeest stampede — those heart- 
wrenching moments when the magic of your 
childhood starts to waver. Even though I was 
well into adulthood, that’s how I felt the first 
time I did backstage fieldwork at Disneyland. 
The enchantment slipped away as I watched 
the characters of my childhood tear their 
stuffed heads off, dripping in sweat. 

I recalled this experience three years ago, 
as I sat in my office at Walt Disney Imagi- 
neering — the science and technology divi- 
sion of the Walt Disney Company, based in 
Glendale, California. The song It’s a Small 
World played as I waited on hold for a col- 
league to check for energy-use data that I 
had requested. These data could help our 
scientific team to understand how people 
use buildings, and to make designs more 
efficient. We knew that closing the exte- 
rior doors of a retail building would reduce 
energy consumption, but we did not know by 
how much or how it would, say, affect traffic 
flow. My role as a scientist was to research 
and devise strategies to reduce energy con- 
sumption across Disney. 

But I and my team fought unsuccessfully 
to get company employees to divulge energy- 
use data. Most researchers have dealt with 
this problem, whatever their scientific field. 
Why, I wondered, must accessing data be 
such a struggle? I also wondered why Disney 
could not find better hold music. 

My colleague, a manager at one of Dis- 
ney’s retail locations, came back on the line. 
She would not release the data, and gave no 
explanation. I explained that the research 
would remain in the company. I urged her to 
reconsider, given that the information could 
ultimately save the company money. “Sorry, 
no can do,” she said. “Have a magical day.” 

It is not surprising that most people are 
afraid to relinquish data, even internally. 
Releasing data is like letting guests see 
Mickey Mouse tear his own head off back- 
stage. If people see how things work, they 
might not want to come back. The holders 
of the data might also worry that exposing 
the information will get them into trouble. 
This fear is a major challenge for scientists 
attempting to do research that might very 


well help the data holders. 

After more than a year of trying to gather 
data at Disney, despite cajoling, harassing and 
coming as close as I could to actually bribing 
facility managers, I managed it for only a few 
buildings. Initially, I saw this as an organiza- 
tional failure for the company. But I realized 
that the problem was not Disney's alone. As 
scientists and engineers, we often assume 
that our research goals are the same as the 
goals of the people from whom we need to 
get the data, when in fact they might be quite 
different. We are not trained to effectively 
engage others in our research. 

In pursuing my PhD, I have learned that 
my job is not just to research, collect and ana- 
lyse data and present results. Scientists must 
be intimately involved in working with those 
who possess and control data, beyond sim- 
ply extracting information. We must teach 
those who will be most affected by our work 
how and why they should be involved in the 
research process. And it is crucial that we 
explore their goals, try to understand their 
apprehension and work to allay those fears. 

Ishould have tried to understand how my 
research affected that Disney manager — 
perhaps buildings with proper energy-use 
practices would reduce costs or improve her 
employees’ comfort. Scientists and engineers 
often encounter resistance from the people 
who stand to benefit most from our work. 
It should not be that way. As the song goes, 
“There’s so much that we share, that it’s time 
we're aware, it’s a small world after all” m 


Andrew Peterman is a doctoral candidate 
in civil and environmental engineering at 
Stanford University in California. 
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GRADUATE STUDENTS 
Teaching aids research 


Teaching others helps science graduate 
students to improve their own research 
skills, according to a study (D. F. Feldon 
et al. Science 333, 1037-1039; 2011). The 
work compared science, technology, 
engineering or maths (STEM) graduate 
students who teach with those who 

only conduct research. It examined the 
quality and testability of hypotheses by 
the students at the start and end of an 
academic year, as well as the strength and 
design of their experiments, on the basis 
of assessments by independent scientific 
reviewers. The analysis is the first of its 
kind to measure the growth of skills, says 
lead author David Feldon, who studies 
STEM education at the University of 
Virginia in Charlottesville. He theorizes 
that teaching in STEM enhances early- 
career scientists’ understanding of what 
comprises good research. 


EMPLOYMENT 


Degree brings prospects 


About 52% of people who graduated from 
US professional science master’s (PSM) 
programmes in 2010-11 had new jobs 

1-6 months after earning their degrees, 
says a survey. Outcomes for PSM Alumni: 
2010/11, released on 23 August by the US 
Council of Graduate Schools (CGS) in 
Washington DC, found that 39% of those 
with new jobs had secured them through 
internships associated with their PSM. 
CGS president Debra Stewart found the 
numbers encouraging. “Employers are 
seeing the value of the PSM,’ she says. 
Most of the jobs were research related. The 
survey had 320 responses from graduates 
of 58 PSM programmes. Advocates call the 
PSM a viable alternative to the PhD. 


WOMEN IN RESEARCH 
Romance beats science 


Young women who want romance show 
less interest in science, technology, 
engineering and maths (STEM) than 

in other fields, says a study (L. E. Park 

et al. Pers. Soc. Psychol. B. 37, 1259-1273; 
2011). The authors gauged reactions of 
350 students to ‘romantic’ images such as 
candlelight and sunsets, to other images 
of books or libraries, and to chats about 
dating or tests. Those who saw and heard 
romantic content reported less interest in 
STEM. Such dynamics could contribute to 
womens low representation in STEM, says 
lead author Lora Park, a psychologist at the 
University at Buffalo in New York. 
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A SENTENCE TO LIFE 


BY IGOR TEPER 


etter to be dead. 
Sitting on a park bench, watching 


the swans scribe their inscrutable 
trajectories on the lake's surface, streams of 
joggers, dog walkers, families with strollers 
flowing around him, but never too close, 
Julius felt more alone than ever. Better to be 
dead. 

He knewhe couldn't blame Tabytha, could 
only blame himself, but the whole thing was 
just so outrageous. If she hadn't broken his 
heart, then stomped on it; hadn't made him 
feel so betrayed and humiliated, he never 
would have posted all those detailed descrip- 
tions of meals consisting of boiled, broiled, 
braised, baked, grilled, poached, sautéed 
and stewed crow on the site where she doc- 
umented her meals. He never would have 
posted exhaustive, fawning reviews of every 
film in the Passion of the Zombie Vampire 
heptalogy on the site where she logged her 
media consumption. He never would have 
replaced all her profile photos with selec- 
tions from the ratemylooks.com all-time 
bottom-20 and updated her relationship 
status to ‘putrescing: 

Doing it, hed been gleeful and self-satis- 
fied, even euphoric — but, afterwards, hed 
felt even worse than before, as if anticipating 
the price hed pay. Was now paying. 

With cybercrime overtaking fleshcrime in 
annual costs; with cybercriminals capable, 
through self-replicating agent programs, of 
being in an unlimited number of places at 
once; with prisons overflowing, the Cyber- 
crime Act of 2032 had been overdetermined. 
The digitization of life that had made cyber- 
banishment meaningful had also made it 
inevitable. 

Cyberbanishment: the deletion of all his 
cyberaccounts and, far worse, the deacti- 
vation of his arfid and his arfreader. Sitting 
there, in the park, he was surrounded by peo- 
ple, but, to him, they were faceless ciphers, 
with no names, no interests, no relationships, 
none of the scores of personal details that 
their arfids were broadcasting for everyone 
else's arfreaders to project onto their retinas. 
And, to them, he was an ersatz person, reveal- 
ing nothing and thus capable of anything. 
Even the dogs, with their mandated arfids, 
were fuller members of society than him. 

For several hours, Julius had tried to make 
eye contact with someone, with anyone, but 
people's eyes just passed over him, as if he 
were an inanimate part of the landscape. 


Pet project. 


The most he got, at least from adults, were 
several double takes as people saw him, then, 
worried that their arfreader was malfunc- 
tioning, looked at someone else, and, reas- 
sured, glanced, very quickly, back at him. 
Dogs and small children with no arfreaders 
did look at him, but their alarmed guardians 
quickly steered them away. 

He reminded himself that hed lived with 
no arfreader until his twelfth birthday, but it 
was one thing to have never experienced the 
world’s true contextual richness, and a very 
much worse thing to have lost it. 

“It's only for a year,’ he told himself. 

“A whole year!” his despondent self 
retorted. 

Ina year, his friends would have forgotten 
him twice over, half his vocabulary would 
be out of date, the entire culture would have 
passed him by. 

He closed his eyes, and squeezed his 
fingers into fists. As he tried to recapture, in 
his mind, the world hed lost, something wet 
brushed over his left wrist. 

He opened his eyes — a dog, a golden 
retriever, was licking the back of his hand. 
Seized by conflicting impulses of gratitude 
to the dog for acknowledging him and alarm 
at being so unceremoniously licked by a 
strange animal, Julius froze. 

“Don't worry — he only does that to peo- 
ple he likes,” said a woman's voice. 

Approaching, looking straight at him: 
short, athletic, dark-skinned, darker-haired 
— perhaps Indian — and... that was all he 
knew about her, which is to say he knew 
nothing. He couldn't remember the last time 

hed conversed with a 


NATURE.COM = complete stranger. 

Follow Futures on Disarming anxiety 
Facebook at: bloomed in the pit of 
go.nature.com/mtoodm © his stomach. He fought 
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it back, aware of the irony, and forced 
himself to focus on her, now just an arm’s 
length away, with his unaugmented sight. 
Her face was freckled, and pretty. 

“Tm Rometa,’ she said, and offered her 
hand. 

“Tm Julius.” As he extended his hand, he 
was painfully aware that his palm was soaked 
with sweat. 

When they shook, only the softest alarm 
went off in the back of his head — he didn't 
know her vaccination status. Then again, her 
dog had already licked him. 

“You must really love dogs,” she said. 

“Why do you say that?” 

“Well, you didn’t recoil from him even 
though he’s...” she glanced around, then 
whispered, “illegal.” When she saw the con- 
fusion on his face, she raised her eyebrows 
and added: “No arfid.” 

He smiled, tasting bitterness. 

“That makes two of us,” he said. 

Now it was her turn to look confused. 

“Ive been, um, cyberbanished,” he said. 
“Couldnt you tell?” 

“IT keep my arfreader off most of the time,” 
she said. “Banished, huh? Must be quite a 
story.” 

“You keep it off? By choice?” 

“To you, that must seem like choosing to 
live in a prison.” 

“How do you interact, with people?” 

“T talk, I listen, I pay attention. IfI want to 
know something about someone, I just ask.” 

“But how do you know if you can trust 
people?” 

“Tve learned that I usually can. Besides, 
Shakespeare here is a better judge of char- 
acter than any arfreader will ever be. Aren't 
you, boy?” 

She bent down and ran her hand over the 
dog’s neck and shoulders, then back around 
to its throat and chin. The dog smiled and 
licked her open palm. 

“T just can't believe it,” Julius said, shaking 
his head. 

“Come on our walk; Rometa said, “and 
Ill show youa thing or two about this prison 
called real life.” 

As he jogged after her into the great 
unknown mysterious world, Julius felt some- 
thing he hadn'tin along time. Anticipation. m 


Igor Teper lives with his wife and son in 
the San Francisco Bay Area and teaches 
old atoms new tricks at temperatures 
near absolute zero. He also writes stories, 
occasionally. 
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A close nuclear black-hole pair in the spiral galaxy 


NGC 3393 


G. Fabbiano!, Junfeng Wang’, M. Elvis! & G. Risaliti!? 


The current picture of galaxy evolution’ advocates co-evolution of 
galaxies and their nuclear massive black holes, through accretion 
and galactic merging. Pairs of quasars, each with a massive black 
hole at the centre of its galaxy, have separations of 6,000 to 300,000 
light years (refs 2 and 3; 1 parsec = 3.26 light years) and exemplify 
the first stages of this gravitational interaction. The final stages of 
the black-hole merging process, through binary black holes and 
final collapse into a single black hole with gravitational wave emis- 
sion, are consistent with the sub-light-year separation inferred 
from the optical spectra‘ and light-variability* of two such quasars. 
The double active nuclei of a few nearby galaxies with disrupted 
morphology and intense star formation (such as NGC 6240 with a 
separation’ of about 2,600 light years and Mrk 463 with a separa- 
tion’ of about 13,000 light years between the nuclei) demonstrate 
the importance of major mergers of equal-mass spiral galaxies in 
this evolution; such mergers lead to an elliptical galaxy’*, as in the 
case of the double-radio-nucleus elliptical galaxy 0402+379 (witha 
separation of about 24 light years between the nuclei)’. Minor 
mergers of a spiral galaxy with a smaller companion should be a 
more common occurrence, evolving into spiral galaxies with active 
massive black-hole pairs’, but have hitherto not been seen. Here we 
report the presence of two active massive black holes, separated by 
about 490 light years, in the Seyfert'’ galaxy NGC 3393 (50 Mpc, 
about 160 million light years). The regular spiral morphology and 
predominantly old circum-nuclear stellar population’ of this 
galaxy, and the closeness of the black holes embedded in the bulge, 
provide a hitherto missing observational point to the study of galaxy/ 
black hole evolution. Comparison of our observations with current 
theoretical models of mergers suggests that they are the result of 
minor merger evolution”’. 

NGC 3393 (Table 1) was observed with the Chandra X-ray 
Observatory’s camera ACIS-S on 28 February 2004 (ObsID 4868 for 


29.7 kiloseconds) and 12 March 2011 (ObsID 12290 for 70 kiloseconds), 
giving us a total of 89.7 kiloseconds after screening for background 
events exceeding three standard deviations over the mean background 
level. We used sub-pixel imaging with a quarter of the native 0.492”’ 
ACIS-S pixel, to recover the mirror resolution (~0.4"’ half-maximum 
radius). We improved the Chandra and Hubble Space Telescope astro- 
metry using stars in the field. Details are in the Supplementary 
Information. We used XSPEC software for the spectral analysis, and 
CIAO and DS9 software for other analyses. 

Figure 1 shows a marginally extended source in the 3-8-keV band, 
suggesting some complexity in the emission. The spectrum shows a 
featureless 3-6-keV continuum and a prominent 6.4-keV Fe K emission 
line, as previously reported’’. We find that the continuum and line 
images differ: there is a single point-like source (upper left of the image, 
to the northeast, NE) in the 3-6-keV band, coincident (within error) 
with the nucleus observed by the Hubble Space Telescope, whereas the 
6-7-keV image contains two sources with centroids 0.6'’ (~150 parsec, 
pc) apart. The ratio of the 6-7-keV and 3-6-keV images shows relatively 
more prominent Fe K emission in the source in the lower right of the 
image (to the southwest, SW), with a position consistent with the maser! 
located by very long baseline interferometry (VLBI). The spectra of both 
sources (Fig. 2) are typical of Compton-thick active galactic nuclei. 
Fits to power-law and reflection-component models (Table 2) show that 
both have observed (2-10keV) luminosity of a few 10*°ergs '. 
The emitted luminosities estimated from the Fe K line intensity’* are 
3.4X 107 erg s’ (NE) and 5.0 10” erg s ' (SW). These high 
luminosities and spectral shapes exclude a starburst contribution, con- 
sistent with the predominantly old central stellar population’’. 

If the 2-10-keV emission were solely from reflected nuclear emis- 
sion in a Compton-thick active galactic nucleus, the large dimensions 
of the reflector (thick clouds near the nucleus) would preclude vari- 
ability. However, variability is implied by the lower flux found in the 


Table 1 | Summary of published results from X-ray observations of NGC 3393 


Observatory and date T Nu (em?) X-ray band (keV) Observed X-ray flux, Fy X-ray luminosity, Intensity, /(Fe K) Equivalent width 
of observation (10-ergem=?s-4) Ly (107° ergs) (10°-© photonscm~*s~+)_ in Fe K (keV) 
BeppoSAX?? 1.7 (fixed) >1 x 1025 20-100 54 180 9.6 1.9+3§ 

1 August 1997 , 
BeppoSAXx?° 2.8) 44ré ; x 1024 20-100 NA NA 14 4+2 

1 August 1997 : 

XMM-Newton?® 16412 >9x107 2-10 0.9+98 3.1%24 2.5 14+08 
5 July 2003 , : 

Chandra’? 1.9 (fixed) 4.7 x 102° (Gal.) 2-10 2.1+0.04 7.5 + 0.1(observed) 4.2 14+0.7 
28 February 2004 720 (Fe kK) >100 (PEXRAV) 

Suzaku?©?7 1.527233 17th x 1024 2-10 4 14 43 05202 
23 May 2007 ; 

Suzaku?©?7 1,5 2*032 b72e3 x 1024 15-50 200 680 43 0520.2 
23 May 2007 , 

Swift?® 168% 53° NA 14-195 255 890 

monitoring : 


Equivalent width is the width of continuum spectrum corresponding to the line flux; /’is the photon spectral index in a power-law model; Nj, is the neutral hydrogen absorption column. ‘Gal.’ is the Galactic line of 
sight (absorption due to the Milky Way); ‘fixed’ means that the variable was not allowed to vary in the fit; NA, not available. The galaxy NGC 3393 (distance about 50 Mpc; Ly ~ 3.4 x 10?° solar luminosities) has a 
Seyfert 2 nucleus, detected in the Ha emission line?”, 13-m infrared2* and radio**25, The prominent 6.4-keV Fe Ka line and luminous (Lx ~ 2 x 10%” ergs” +) high-energy X-ray (>10 keV) emission!3!5212628 
suggest a Compton-thick active galactic nucleus. The X-ray luminosity values Ly are observed values including continuum and Fe K line emission, except when inferred from the Fe K or reflection component (using 


the PEXRAV model®°) luminosities and noted as such’. 


1Harvard-Smithsonian Center for Astrophysics (CfA); 60 Garden Street Cambridge, Massachusetts 02138 USA. @INAF-Arcetri; Largo Enrico Fermi 5, Firenze 50125, Italy. 


00 MONTH 2011]! VOL 000 | NATURE | 1 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


0 0.37 0.73 11 1.5 1.9 


Figure 1 | Chandra ACIS-S images. North is up and East to the left. a, Image 
in the 3-8-keV spectral band of the NGC 3393 nuclear region with quarter- 
subpixel binning, smoothed with a ¢= 0.25’’ Gaussian. Contours of the Hubble 
Space Telescope F664N H« emission (ref. 17) and Very Large Array (VLA) 
radio telescope 8.4-GHz emission (ref. 25) are shown in grey and green, 
respectively. The diamond and cross indicate the positions of the sources 
observed by the Hubble Space Telescope’’ and VLBI", respectively. The X-ray 
source contains 279 + 16 counts in the 3-8-keV band. b, Image in the 3-6-keV 
spectral band, showing continuum emission dominated by the NE source at 
right ascension (RA) 10h 48 min 23.47 s and declination (Dec.) —25° 09’ 43.1 
(all positions are J2000.0), coincident with the Hubble Space Telescope 
position’’. This position is also consistent with that of the 13-t1m source 
detected in the Very Large Telescope’s visible+infrared (VLT/VISIR) image” 


observation’® by the X-ray Multi-Mirror Mission (XMM)-Newton 
orbiting space observatory. Transitions from Compton-thick to 
Compton-thin have been observed in some active galactic nuclei’®, 
suggesting temporary ‘holes’ in the wall of obscuring clouds. The 
dimming observed by XMM-Newton may be related to the NE source, 
because an intrinsically obscured power-law component, Np « EY, 
where E is energy, with parameters ’~ 1.9 and Ny ~ 2 X 107% cm? 
could fit its spectrum, suggesting that some direct nuclear emis- 
sion may be visible. A passing broad-line-region cloud of 
Ny ~ 2X 10%? cm~* may have obscured this component during the 
XMM-Newton observation, leaving only residual scattered continuum. 
Even so, given the large Fe K equivalent width (Table 2), we may be 
seeing only reflected emission. In this case, we would be observing an 
additional absorption of Ny ~ 2X 107>cm 7 towards the (warm) 
reflector. 

The SW nuclear source, with weaker, flat continuum, and no optical 
Ho counterpart” (Fig. 1), has prominent, and possibly complex, Fe K 
emission. Although its luminosity is consistent with the most lumin- 
ous ultraluminous sources detected in nearby galaxies, its spectrum 
argues for a Compton-thick active galactic nucleus. A 6.5-keV Fe line 
has been reported in the M82 X-1 ultraluminous source, but this line is 
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6-7 keV/3-6 keV 


0 0.25 0.49 0.74 0.98 1.2 


(P. Gandhi, personal communication). c, Image in the 6-7-keV band including 
both continuum and Fe K line emission; the dashed circles are the spectral 
counts extraction areas. d, Ratio of images shown in b and ¢; the values of the 
ratios in the regions indicated by the dashed outlines are 0.61 + 0.04 (NW), 
1.14 + 0.10 (SE) and 0.46 + 0.02 (in between). The Fe K emission is relatively 
more prominent in a source to the SW of the continuum source, at RA 

10h 48 min 23.45 s, Dec. —25° 09’ 43.6'’. The latter position is closer to the 
nuclear position from the Chandra ObsID 4868 image’*, which was based on 
the centroid of the Fe K emission. Given the astrometric uncertainty, the SW 
source is consistent with the VLBI position of the nuclear maser’. The sources 
are visible in both Chandra observations, although statistics are limited in the 
first. In the previous Chandra analysis’*, subpixel binning and imaging in 
separate spectral bands were not pursued. 


very broad and its equivalent width is model-dependent’’. The Fe K 
line from the SW source instead has the well-defined narrow core 
found in Compton-thick nuclei, understood to be fluorescent emission 
excited by the scattered nuclear radiation’. Moreover, unlike the SW 
source, strong continuum emission dominates the Chandra spectra of 
ultraluminous sources”. 

We can rule out the interpretation that the two sources are the result 
ofa single active galactic nucleus interacting with clouds. Although the 
spectral shape of the emission of the SW source cannot exclude a local 
mirror reflecting flux from the NE source, given the ~150-pc separa- 
tion and the Chandra limit on size of <0.4’’, the SW source covering 
factor exceeds 1/70, implying an intrinsic luminosity of the NE source 
well above its measured value. Also, in the reflection hypothesis, we 
should be detecting Ho and [O 1] emission from the SW source, but 
none is seen’; this lack argues for a totally cocooned source, where the 
Fe K emission is seen in transmission’*. The spatial coincidence with 
the maser emission’* observed by VLBI (Fig. 1) reinforces this conclu- 
sion. The NE source cannot be due to reflection by a similarly small 
reflector, given both its variability’® and the modest far-infrared lumin- 
osity of the system: ~3 X 10° ergs’ ' at 60 jm, based on the Infrared 
Astronomical Satellite (IRAS) point source catalogue flux of 2.25 Jy, a 
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value comparable with the estimated intrinsic nuclear emission. 
Finally, a jet interacting with the interstellar medium in the host galaxy 
bulge is also ruled out because the luminosity measured in the Fe K line 
would require extreme conditions in terms of shock energy and gas 
ionization, and produce copious soft X-ray emission not seen with 
Chandra. 

We conclude that there are two obscured active galactic nuclei, each 
powered by accretion onto a massive black hole, in the central regions 
of NGC 3393. The emitted luminosities inferred from the Fe K lines are 
both a few 10% erg st, showing that both sources contribute to the 
Compton-thick emission seen by the X-ray astronomy satellite 
BeppoSAX”’, with the SW source being more prominent. 

The inferred intrinsic X-ray luminosities, for the spectral energy 
distribution (SED) of a standard active galactic nucleus”, and using a 


Table 2 | Results of the spectral analysis of sources NE and SW 


LETTER 


Figure 2 | X-ray spectra. a, The NE source, with intrinsically absorbed power- 
law best fit. b, The SW source, with best-fit PEXRAV (Table 1) model. Errors 
are one standard deviation. Spectra were extracted from the regions shown in 
Fig. 1c with background from a source free region 10"’ to the east of the nuclei, 
and fitted with XSPEC (version 12.6.0) using the C-statistic. The NE spectrum, 
which shows a downturn around 3 keV, is well fitted with an intrinsically 
absorbed (in addition to the line-of-sight Galactic Ny ~ 4.7 X 107° cm”) 
power-law continuum plus Fe K line, although the uncertainties in the 
parameters are large and this model is not statistically a unique choice. Given 
the flat continuum and more prominent Fe K line, the SW source is likely to be 
Compton thick. An intrinsically absorbed power-law model fit gives only flat 
power laws and no intrinsic absorption. We have assumed a power-law + 
reflection component model for the continuum, fixing the Nj to the Galactic 
value, that would be representative of Compton-thick emission’’. This model 
was also used for the NE source, as an alternative to the intrinsically absorbed 
power law. In the SW source the luminosity is dominated by the Fe K emission. 
X-ray luminosity Lx(3-6 keV) = 3.7 + 0.7 X 10°’ ergs _', whereas 

Lx(Fe K) = 1.2+0.3 X 10” erg s_'. The equivalent width of the Fe K line is 
1.167 $85 keV (NE), and 2.771 +3 keV (SW) at the 68% confidence level. The 
Fe K line of the SW source appears broadened (0.13 + 0.05 keV) and complex, 
with a main peak at Fe Ko, and a secondary peak at Fe Kf. The ratio of the KB 
intensity to the Ko intensity, I(KB)/I(Ka), is 0.35 + 0.19, consistent with the 
expected range of I(K)/I(Ka) ~ 0.12-0.2 for neutral or weakly ionized iron”. 
The results are listed in Table 2. 


standard accretion-rate-to-luminosity conversion efficiency for quasars 
of 10% for the Eddington accretion rate, yield masses of ~8 X 10° solar 
masses and ~ 10° solar masses for the NE and SW sources, respectively. 
However, lower efficiencies and sub-Eddington accretion are possible, 
so the above masses are lower limits. The dynamic VLBI mass mea- 
surement" of ~3 X 10’ solar masses for the SW source implies that the 
product of efficiency and accretion rate must be lower by a factor of 
approximately 30. 

If the masses of the two massive black holes are similar, the relation” 
of the mass of the galactic bulge to the mass of the massive black hole 
suggests that NGC 3393 may be the remnant of a major merger of two 
similar spiral galaxies. After about five billion years, this merger would 
produce a remnant with prominent ‘grand-design’ spiral arms and 
massive black-hole binary separation, as seen in NGC 3393 (ref. 8). 
However, the stellar population of this bulge would be significantly 
rejuvenated (L. Mayer, personal communication), which is at odds 
with the age of the stellar population in the central 200pc of 
NGC 3393 (ref. 12). Moreover, in a major merger, the merging time- 
scale* for two massive black holes at a separation of 150 pc is about a 
million years, making the detection of such events rare. 

Although this occurrence cannot be excluded on the basis of a single 
detection, a better explanation of our results may be the merger of 
galaxies of unequal mass (and therefore of massive black holes of 
unequal mass), which would result in a longer, billion-year timescale’®, 
and would be consistent with the lack of widespread star formation. 
The constraints on the masses of the two massive black holes allow this 
possibility. Interestingly, minor mergers may also have resulted in the 
growth of one of the massive black holes by promoting more active 
nuclear accretion in the smaller massive black hole’’. The denser 


Source; counts I” Nu (em~?) F,(2-10 keV) Lx(2-10 keV) (107° ergs”) I (Fe K) L(Fe K) Equivalent width C-stat. (d.o.f.) 
(1073 ergem~?s~}) (10-® photonscm~*s~!) (10°%ergs~!) in Fe K (keV) 

NE; 13429 1,9+06 (2.1 + 0.6) x 1073 1.3493 4.4*28 (observed) 2.0+0.5 7+17 1.2498 263 (337) 
340 (Fe k) 

NE; 134+9 1.7 (fixed) 4.7 x 10?°(Gal.) 1.2+0.2 3.8 + 0.6 (observed) 1.9+0.5 TEM 1.0497 263 (338) 
>60 (PEXRAV) , 

SW; 75+8 1.7 (fixed) 4.7 x 10°° 0.7 +02 2.3*02 (observed) 2.6+0.6 12.0 + 2.7 2.8%19 200 (338) 
500 (Fe K) >70 (PEXRAV) 


C-stat., statistic from the C method; d.o.f., degrees of freedom. The counts of each source were extracted in the 3-8 keV band. Flux and luminosity were estimated in the 2-10 keV band to compare with results in the 
literature. Flux and luminosity are observed values including continuum and Fe K line emission, except when inferred from the Fe K or reflection component (PEXRAV) luminosities and so noted, in which case they 
are estimates of the emitted values. Errors are one standard deviation (68%). Aperture corrections were applied to account for the ~35% of the spectral photons in the wings of the Chandra point-spread function 
missed from the extraction regions. The total observed (2-10 keV) luminosity of the combined NE+SW emission is consistent with the results from the first Chandra observation?’, which did not resolve the two 
sources (see Table 1 for these comparisons). Considering the difference in beam size, and the luminosity of the rest of the galaxy’, these fluxes are also consistent with the measurement from the X-ray astronomy 
satellite Suzaku. The XMM-Newton measurement instead gives a lower value of the combined luminosity, suggesting a factor-of-2 variability in the 2-10 keV range for the total NE+SW flux over a one-year 
timescale. At higher energies (>15 keV), the 5-year light curve of NGC 3393 (ref. 28) from the Burst Alert telescope (BAT) on the spacecraft Swift does not show any significant variability. 
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circum-nuclear environment of the SW source, as suggested by our 
results, may have resulted from such a process. 


Received 14 April; accepted 11 July 2011. 
Published online 31 August 2011. 
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Structural basis of PIP» activation of the classical 
inward rectifier K* channel Kir2.2 


Scott B. Hansen, Xiao Tao! & Roderick MacKinnon! 


The regulation of ion channel activity by specific lipid molecules is 
widely recognized as an integral component of electrical signalling in 
cells’. In particular, phosphatidylinositol 4,5-bisphosphate (PIP2), a 
minor yet dynamic phospholipid component of cell membranes, is 
known to regulate many different ion channels” *. PIP, is the primary 
agonist for classical inward rectifier (Kir2) channels, through which 
this lipid can regulate a cell’s resting membrane potential”. 
However, the molecular mechanism by which PIP, exerts its action 
is unknown. Here we present the X-ray crystal structure of a Kir2.2 
channel in complex with a short-chain (dioctanoyl) derivative of 
PIP,. We found that PIP, binds at an interface between the trans- 
membrane domain (TMD) and the cytoplasmic domain (CTD). The 
PIP,-binding site consists of a conserved non-specific phospholipid- 
binding region in the TMD and a specific phosphatidylinositol- 
binding region in the CTD. On PIP, binding, a flexible expansion 
linker contracts to a compact helical structure, the CTD translates 
6A and becomes tethered to the TMD and the inner helix gate 
begins to open. In contrast, the small anionic lipid dioctanoyl 


glycerol pyrophosphatidic acid (PPA) also binds to the non-specific 
TMD region, but not to the specific phosphatidylinositol region, and 
thus fails to engage the CTD or open the channel. Our results show 
how PIP, can control the resting membrane potential through a 
specific ion-channel-receptor-ligand interaction that brings about 
a large conformational change, analogous to neurotransmitter 
activation of ion channels at synapses. 

PIP, influences the metabolic state of cells by at least three distinct 
pathways (Supplementary Fig. 1a, b): first, as the prototypical second 
messenger being cleaved into diacyl glycerol and inositol tripho- 


sphate'®"'; second, as a localization signal targeting soluble proteins 


to the plasma membrane’*"; and third, as a signalling molecule cap- 
able of agonizing an ion channel’ *"*"*. This latter role, in which an ion 
channel is activated by PIP3, was first discovered in 1998 when it was 
shown that PIP, acted alone to open a Kir channel’. 

Figure 1a, b shows the influence of PIP, on the function of Kir2.2 
from chicken. Following excision of an inside-out membrane patch 


from a Xenopus oocyte expressing Kir2.2 channels, initially large 
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Figure 1 | Effect of a short-chain PIP, on Kir2.2. a, Endogenous PIP, 
depletion causes ‘run down’ of Kir2.2 channels in an excised inside-out patch from 
Xenopus oocytes as shown by the three macroscopic current traces recorded with a 
voltage ramp from —80 to +80 mV immediately (green), 30 min (blue) and 50 min 
(black) after patch excision. b, The short-chain PIP added to the bath solution (solid 
line with concentration indicated below) beginning 32 min after patch excision 
partially rescued Kir2.2 channelactivity. The bath was then perfused (dashed line) at 
time = 40 min with ~1 ml min’ 'bathsolution for3 min.c, X-ray crystal structures 
of apo- (left, PDB code 3JYC) and PIP -bound (right, PDB code 3SPI) Kir2.2 


tetramer (grey «-carbon traces) viewed from the side with the extracellular solution 
above. The lipid bilayer boundaries are shown as grey bars. Four PIP molecules are 
shownas sticks and coloured according to atom type: carbon, yellow; phosphorous, 
orange; and oxygen, red. One PIP, molecule in a similar orientation as in Fig. 2a is 
outlined bya black box. On PIP; binding the flexible linker between CTD and TMD 
consisting of two strands (highlighted green for one subunit, dotted line indicating 
disordered region in the crystal structure) form helical structures, and the CTD 
translates towards the TMD by 6 A. A set of reference atoms (Asp 72 and Lys 220 
a-carbons) are highlighted as blue spheres in each structure. 
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inward K* currents diminish over time. The diminution occurs 
because PIP, is depleted from the membrane’s inner leaflet®. The K* 
currents can be restored partially by exposing the cytoplasmic face of 
the patch to the short-chain derivative of PIP, in a dose-dependent 
manner’”’* (Fig. 1b). PIP2 is the primary agonist for Kir2 channels, 
through which this lipid can regulate a cell’s resting membrane poten- 
tial. Here we use X-ray crystallography to understand the mechanism 
by which PIP, opens a Kir2 channel. 

Kir2.2 is a tetrameric ion channel comprised of a TMD, which forms 
the prototypic potassium-selective pore, and a large CTD, which char- 
acterizes all Kir channels’? (Fig. 1c). We determined the structures of 
wild-type Kir2.2 from chicken (with disordered segments of the amino 
and carboxy termini truncated) in the presence of the short-chain 
derivative of PIP, at 3. 3A resolution (Fig. 1c). We also determined 
the structures of two point mutants, 1223L and R186A, in the presence 
of PIP, at 3.0A and 2.6A resolutions, respectively (Supplementary 
Table 1). These mutants were studied because they are described in 
the literature as altering the apparent affinity for PIP,’’°. All three 
channels have overall similar structures and taken together enhance 
our knowledge of the detailed chemical properties through which PIP, 
binds to and modifies the channel’s structure and function (Sup- 
plementary Fig. 2a, b). Electron density maps are of high quality for 
the entire protein, and strong density for the three phosphates in 
PIP,—observed in all three structures—allowed accurate placement 
of the ligand (Supplementary Fig. 3a). Furthermore, the glycerol back- 
bone of PIP3 is well ordered and easily placed in the higher resolution 
structures, along with 4-6 carbons of the lipid acyl chains (Sup- 
plementary Fig. 3a). One PIP molecule binds to each of the four 
channel subunits (Fig. 1c). 

PIP, binds at the interface between the TMD and CTD and pro- 
duces a large conformational change in Kir2.2 (Fig. 1c). The entire 
CTD translates 6 A towards the TMD in association with the forma- 
tion of two new helices, an N-terminal extension of the ‘interfacial’ 
helix anda ‘tether’ helix at the C terminus of the inner helix (Fig. 2a and 
Supplementary Fig. 3b). The 6 A translation of the CTD is reflected ina 
compression along the c-axis of the unit cell (Supplementary Table 1). 
The protein conformational changes position amino acids that form 
the binding site for the 4’,5'-phosphate-substituted inositol head 
group of PIP. 

The PIP3-binding site comprises amino acids from two main struc- 
tural regions of the channel. The acyl chains, glycerol backbone and 1’ 
(phosphodiester) phosphate of PIP, interact with the TMD, while the 
inositol head group makes interactions with the CTD (Fig. 2a and 
Supplementary Fig. 4). In detail, the acyl chains insert into the mem- 
brane layer where they interact with hydrophobic amino acids on both 
the inner and outer helices, while the 1’ phosphate makes interactions 
with amino acids forming the sequence arginine-tryptophan-arginine 
(amino acids 78-80 in Kir2.2) (Fig. 2b). This sequence is conserved as 
arginine-tryptophan-arginine or lysine-tryptophan-arginine among 
many different Kir channels, and the reason for this conservation is 
made clear by the PIP, complex: the arginine-tryptophan-arginine 
sequence is located at the N terminus of the outer helix and forms a 
binding site in which the 1’ phosphate caps the helix and is cradled by 
main-chain amide nitrogen atoms and the guanidinium groups from 
the two arginine residues (Fig. 2a and Supplementary Fig. 4). The 
tryptophan residue appears to anchor the end of the outer helix at 
the membrane interface and also interact with one of the acyl chains. 
With the acyl chains, glycerol backbone and 1’ phosphate of the lipid 
molecule contacting the TMD, the inositol ring of the head group is 
oriented towards the CTD, where the 4’ and 5’ phosphates are posi- 
tioned to interact directly with Lys 183, Arg 186, Lys 188 and Lys 189 
(Fig. 2a and Supplementary Fig. 4). The latter two positively charged 
amino acids are located on the tether helix, the structure of which is 
induced by the binding of PIP,. Other amino acids on the tether helix, 
including Arg 190, participate in the formation of a hydrogen-bonding 
network that seems to strengthen the interaction between the tether 
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Figure 2 | PIP-binding site. a, A detailed view of the PIP,-binding site is 
shown in a similar orientation as outlined in Fig. 1c. Helices (shown as ribbon) 
from different subunits are distinguished by their interior colour (orange and 
cyan). Residues hydrogen bonded (dashed lines) to PIP, are coloured green, 
and residues stabilizing the PIP,-binding site in the CTD but lacking direct 
contact are coloured blue. All side chains are shown as sticks. PIP, is shown as 
sticks and coloured according to atom type: carbon, yellow; phosphorous, 
orange; and oxygen, red. b, An amino acid sequence alignment of selected 
eukaryotic Kir channels showing residues predicted from the literature (blue 
outline) and not predicted (purple outline) to interact with PIP,*”*. Residues 
with direct bonding interactions to PIP, and with a structural role are 
highlighted in green and blue, respectively. The two residues serving as the 
inner helix gate are highlighted in grey. 


helix and other regions of the CTD, especially the N-terminal exten- 
sion of the interfacial helix, the structure of which is also induced by the 
binding of PIP» (Fig. 2a and Supplementary Fig. 4). A sequence align- 
ment shows that the amino acids binding to PIP are highly conserved 
among the large family of inward rectifier K* channels (Fig. 2b). 
Because all members of this ion channel family seem to be regulated 
by PIP, (some in concert with other ligands such as ATP or G pro- 
teins)'®, we anticipate that the PIP, site described here will be observed 
in many other inward rectifiers. 

The detailed chemical properties of the PIP,-binding site suggest that 
the TMD region should bind to any lipid that contains a glycerol 
backbone, acyl chains and a 1’ phosphate, whereas the CTD should 
provide the specificity for the inositol phosphate head group. Moreover, 
because the head group region of the binding site is formed only after 
the conformational changes occur in the channel, we would predict that 
a glycerol phospholipid without an inositol head group would bind to 
the TMD but not induce the conformational changes. We tested this 
prediction by determining a 2.45 A resolution crystal structure of Kir2.2 
in the presence of a short-chain derivative of pyrophosphatidic acid 
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Figure 3 | Conserved non-specific lipid-binding site in Kir channels. a, A 
grey o-carbon representation of Kir2.2 tetramer in complex with PPA, a small 
anionic lipid lacking an inositol ring. PPA-bound Kir2.2 assumes a closed 
conformation similar to apo-Kir2.2 (PDB code 3JYC) with the flexible linker 
elongated and the CTD unengaged. The four PPA molecules are shown as sticks 
and coloured according to atom type: carbon, yellow; phosphorous, orange; and 
oxygen, red. b, A close-up view of the PPA-binding site. PPA contacts Kir at the 


(PPA), which contains as a head group only phosphoric acid instead of 
the 4’,5’-phosphate-substituted inositol ring (Supplementary Table 1). 
This lipid is bound to the TMD in a manner almost identical to PIP; 
however, the head group does not interact with the CTD and the 
protein conformational changes induced by PIP, do not occur 
(Fig. 3a-c). This finding is compatible with recent functional studies 
showing that small head group anionic lipids failed to activate Kir 
channels in the absence of PIP,”’. 

We wish to understand how the PIP,-induced conformational 
changes relate to channel activity. Comparison of the inner helix gate 
in the PIP, and PPA complexes shows that the interaction of the CTD 
with the TMD induced by PIP> is key to opening the gate (Fig. 4a—c). In 
the PPA complex, in which the CTD is extended away from the TMD, 
the gate in the TMD is tightly closed (4.9 A at Ile 177), whereas in the 
PIP, complex the inner helices have begun to separate (6.3 A). The 
separation of helices comes about as a result of a slight splaying, but 
more significantly a rotation of the inner helices, which moves hydro- 
phobic amino acid side chains away from the ion pathway (Fig. 4a—c). 
Opening of the inner helix gate to approximately 6.3 A (approximately 
5 A diameter between van der Waals surfaces of carbon atoms at the 
narrowest region) is probably still insufficient to permit ion conduction, 
but the gate is clearly on the way to an open conformation. A previously 
published study of prokaryotic Kir channels proposed that interactions 
between the TMD and CTD of those channels influence the distribution 
of ions in the selectivity filter’. We observe no such influence of the 
CTD on ions in the filter in the high-quality electron density maps in 
our analysis of Kir2.2 (Supplementary Fig. 5a—c). In the present study, 
the data support a simple allosteric mechanism of gating control by the 
signalling lipid PIP», in which the lipid mediates docking of the CTD to 
the TMD and opening of the inner helix gate, as depicted in Fig. 4d. 

The ion pathway in Kir channels has a second constriction formed 
by the G-loop, at the apex of the CTD. This loop in some instances is 
thought to function as a gate, referred to as the G-loop gate”. In Kir2.2 
the conformation of the G-loop is altered by PIP2, either directly 
through the binding of PIP, or indirectly through the docking of the 
CTD to the TMD when PIP, binds (Supplementary Fig. 6a, b). But it 
seems that PIP, does not control the G-loop gate to a large extent, 
because in both conformations this gate is open (smallest diameter 
7.8 A) (Supplementary Fig. 6a). The mutation 1223L affects the 
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cytoplasmic end of the outer helix making strong interactions with the 
guanidiniums of R78 and R80 and the backbone amide nitrogens of the helix 
turn; similar to the interactions of the 1’ phosphate of PIP. However, residues 
(blue sticks) for interacting with the PIP, inositol-4’,5’-phosphate remain 
distant to the lipid-binding site; R186 orients with its side chain pointing 
towards the ion-conduction pathway. c, Superposition of PPA (coloured the 
same as in a) and PIP, (grey). 


conformation of the G-loop gate in a manner that might explain the 
apparent increased affinity for PIP,”. In this mutant, although the 
CTD does not bind to the TMD and the tether and interfacial helices 
do not form, the G-loop adopts its PIP2-bound conformation (Sup- 
plementary Fig. 6c, d). It is thus possible that the mutant favours PIP, 
b 


a c 
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Figure 4| A proposed mechanism of Kir2.2 activation by PIP). 

a, Superposition of the TMD inner helices of the PIP2-bound (blue ribbon) and 
apo- (red ribbon) Kir2.2 structures. PIP, binding results in a splaying of the 
helices near the helix bundle activation gate. b, c, Comparison of the inner helix 
bundle gate in PPA-bound Kir2.2 (b) and PIP,-bound Kir2.2 (c) viewed from 
the extracellular side. Side chains of the residues in the bundle crossing are 
represented as either grey sticks or space-filling CPK models (carbon, yellow; 
and sulphur, green). d, A proposed mechanism for Kir2.2 activation by PIP). 
PIP, (purple sphere) binds at an interface between the TMD (grey cylinder) and 
the CTD (grey rectangle) and induces a large conformational change: a flexible 
linker (green line) contracts to a compact helical structure (green cylinder), the 
CTD translates towards and becomes tethered to the TMD, the G-loop (cyan 
wedge) inserts into the TMD and the inner helix activation gate opens. 
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binding by tending to favour the bound configuration before PIP, 
binds. 

The membrane lipid PIP, has a central role in cell signalling through 
three distinct pathways (Supplementary Fig. 1a). In one of these path- 
ways PIP acts directly on specific ion channels to regulate their activity. 
PIP, is the primary agonist for Kir2 channels, which control the resting 
membrane potential in many cells. Since this discovery more than ten 
years ago, this form of ion channel regulation has been a topic of intense 
study. The crystal structures presented here reveal the mechanism of 
PIP, activation of Kir2 channels. PIP, binds to a lipid-binding site at the 
membrane’s inner leaflet, and through specific interactions between the 
4',5'-inositol-phosphate head group and the channel a large conforma- 
tional change occurs, initiating pore opening. 


METHODS SUMMARY 


Chicken Kir2.2 with a C-terminal GFP and a 1D4 epitope was expressed in Pichia 
and purified in n-decyl-s-p-maltopyranoside (DM, Anatrace) by 1D4 antibody 
affinity chromatography followed by PreScission protease cleavage and gel filtra- 
tion’. Purified protein was concentrated to 9mg ml! and mixed with freshly 
prepared dioctanoyl PIP; (10 mM stock in water) or dioctanoyl PPA (100 mM 
stock in water) to a final concentration of 0.6-1mM and 5 mM, respectively. 
Crystals, diffracting between 2.45 and 3.3 A, were obtained from a 200 nl hanging 
drop with 4mM DM, 20 mM dithiothreitol, 3 mM TCEP, 0.5 M KCland PEG 400 
or PEG 4000 as a precipitant and cryoprotected in reservoir solution containing 
25-30% glycerol. Phases were obtained by molecular replacement with apo-Kir2.2 
(PDB code 3JYC) using MolRep™ in the CCP4 suite”. The models were built in 
Coot” and refined in Phenix” to Ryee of 0.22 to 0.28. Complete crystallographic 
data and refinement statistics are shown in Supplementary Information. 

Electrophysiology experiments were conducted using patch clamp on Xenopus 
oocytes expressing wild-type Kir2.2. Briefly, oocytes were injected with 50 nl 
(~2mgml') cRNA and used for patch recording after 2-3 days. Large pipette 
tips with typical resistance of 0.4-0.9 MQ were used. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Cloning, expression and purification. Kir2.2 from chicken with a GFP anda 1D4 
epitope at the C terminus was expressed in Pichia and purified in n-decyl-p-p- 
maltopyranoside (DM, Anatrace) by 1D4 antibody affinity chromatography fol- 
lowed by PreScission protease cleavage and gel filtration on a superdex 200 column 
as previously described'’. Purified protein was concentrated to 9mgml '. For 
crystallization trials of PIP;-Kir2.2 channel complex, freshly prepared PIP, 
(10 mM stock in water) was added to the concentrated protein at a final concen- 
tration of 0.6-1 mM lipid and 8 mg ml ’ protein and incubated for about an hour 
before setting up trays. For the crystallization trials of the PPA-Kir2.2 channel 
complex, 5mM PPA (100 mM stock in water) was used. 

Structure determination. Co-crystals of Kir2.2 with PIP, or PPA were obtained 
from a 200 nl (100:100 nl protein:reservoir mixture) hanging drops. The protein 
buffer solution contained 4 mM DM, 20 mM dithiothreitol, 3 mM TCEP, 150 mM 
KCl and 20 mM Tris-HCl pH 8.0. Reservoir solution yielding the best diffracting 
crystals contained 0.3-0.6 M KCl, 50 mM HEPES (pH 6.5-7.5) plus 10-20% PEG 
400 (w/v) or 3-8% PEG 4000 (w/v). Diamond-shaped crystals, 150-350 jum in the 
longest dimension, grew within 48h at 4°C. The crystals were cryoprotected in 
reservoir solution plus 25-30% (v/v) glycerol (5% increment steps) and flash 
frozen in liquid nitrogen. Diffraction data were collected at beamlines X29 and 
X25 (Brookhaven NSLS). Crystals with PIP, or PPA diffracted to 2.6-3.3 A or 
2.45A respectively. The crystals all belong to the 14 space group with one subunit 
in the asymmetric unit. Phases were obtained by molecular replacement from apo- 
Kir2.2 (PDB code 3JYC) using MolRep™ in the CCP4 suite’. The models were 
built in Coot” and refined in Phenix?’ to an Rgee of 0.22 to 0.28. There are no 
Ramachandran outliers (97% most favoured, 3% allowed). Complete crystal- 
lographic data and refinement statistics are shown in Supplementary Table 1. 
The PIP,-bound model contains residues from 42-369. In the PPA-bound 
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structure, part of the interfacial helix is disordered and the final model contains 
residues 42-62 and 70-369. Waters were added with ARP/wARP” in the CCP4 
suite? and manually adjusted in the 2.45 and 2.6 A structures. Figures were made 
with PyMOL”. 

Electrophysiology. cRNA of chicken Kir2.2 was made from Ndel linearized 
Kir2.2 in the pGEM vector” using the Amplicap T7 RNA kit (Epicentre 
Biotechnologies). Xenopus oocytes were prepared as described’? and injected with 
50 nl of CRNA 12-20 h later. All recordings were made with patch clamp in inside- 
out configuration 2-3 days after injection. Injected oocytes were treated with 
ND96 (96 mM NaCl, 2mM KCl, 1.8mM CaCh, 1 mM MgCh, 50 ug ml! genta- 
mycin, pH 7.6 with NaOH) plus 200mM NaCl for 5-10 min and the vitelline 
membrane was removed before seal formation. On-cell membrane seals were 
formed using pipettes with typical resistance of 0.4—-0.9 MQ and large inside-out 
patches were excised with currents ranging from 0.2 to 5 nA and seals from 0.4 to 
1 GQ. The bath solution contained 130 mM KCl, 5mM HEPES, 5mM K3EDTA, 
pH 7.4 with KOH. The pipette solution contained 140 mM KCI, 5mM HEPES, 
0.3mM CaCl, 1mM MgCh, pH 7.4 with KOH. For PIP2 rescuing experiments 
described in Fig. 1b, 10 mM dioctanoyl PIP, prepared in water was added to the 
bath solution and mixed by pipetting. 

All patch recordings were made with a voltage ramp from +80 to —80 mV in 
10s duration under the control of an Axopatch 200B amplifier, Digidata 1440A 
analogue-to-digital converter and pClamp10.1 software (Axon Instruments). For 
Fig. 1b, the voltage ramp was repeated every 30s after patch excision and the 
amount of current at +70mV was plotted against the time (immediately after 
excision: time = 0). Figure 1a and b was made with Igor Pro (Wavemetrics). 


29. Cohen, S. X. et al. Towards complete validated models in the next generation of 
ARP/WARP. Acta Crystallogr. D 60, 2222-2229 (2004). 
30. Delano, W. L. PyMOL. (http://www.pymol.org) (Delano Scientific, 2002). 


©2011 Macmillan Publishers Limited. All rights reserved 


LETTER 


doi:10.1038/nature10379 


In vitro centromere and kinetochore assembly on 
defined chromatin templates 


Annika Gusel, Christopher W. Carroll!, Ben Moree!, Colin J. Fuller! & Aaron F. Straight’ 


During cell division, chromosomes are segregated to nascent daughter 
cells by attaching to the microtubules of the mitotic spindle through 
the kinetochore. Kinetochores are assembled on a specialized chro- 
matin domain called the centromere, which is characterized by the 
replacement of nucleosomal histone H3 with the histone H3 variant 
centromere protein A (CENP-A). CENP-A is essential for centromere 
and kinetochore formation in all eukaryotes but it is unknown how 
CENP-A chromatin directs centromere and kinetochore assembly’. 
Here we generate synthetic CENP-A chromatin that recapitulates 
essential steps of centromere and kinetochore assembly in vitro. We 
show that reconstituted CENP-A chromatin when added to cell-free 
extracts is sufficient for the assembly of centromere and kinetochore 
proteins, microtubule binding and stabilization, and mitotic check- 
point function. Using chromatin assembled from histone H3/CENP- 
A chimaeras, we demonstrate that the conserved carboxy terminus of 
CENP-A is necessary and sufficient for centromere and kinetochore 
protein recruitment and function but that the CENP-A targeting 
domain—required for new CENP-A histone assembly’—is not. 
These data show that two of the primary requirements for accurate 
chromosome segregation, the assembly of the kinetochore and the 
propagation of CENP-A chromatin, are specified by different ele- 
ments in the CENP-A histone. Our unique cell-free system enables 
complete control and manipulation of the chromatin substrate and 
thus presents a powerful tool to study centromere and kinetochore 
assembly. 

Metazoan centromeres are specified epigenetically by the presence of 
CENP-A nucleosomes’. Structural differences between CENP-A and 
histone H3 nucleosomes”* and/or specific protein recognition elements 
in CENP-A seem to provide the information that specifies centromere 
identity and directs kinetochore assembly in a DNA-sequence- 
independent manner*“°. Moreover, many metazoan centromeres are 
complex in their organization, with interspersed blocks of CENP-A 
nucleosomes and histone H3 nucleosomes assembled on long arrays 
of repetitive DNA". The difficulty in purifying and manipulating 
complex centromeres has limited our understanding of how centro- 
meric chromatin promotes centromere and kinetochore formation and 
chromosome segregation. 

To mimic the arrays of CENP-A nucleosomes present in complex 
vertebrate centromeres, we reconstituted human CENP-A chromatin 
from recombinant components (Fig. 1a). We generated saturated chro- 
matin arrays by salt dialysis of purified histone proteins H2A, H2B, H4 
and either CENP-A or H3 with a biotinylated DNA template containing 
19 repeats of a 147 bp high-affinity nucleosome positioning sequence 
(19X601) (Supplementary Fig. 1a, b)'*!’. We bound the biotinylated 
arrays to streptavidin-coated magnetic beads, thereby immobilizing 
the arrays so that they can be easily added to and recovered from cell 
extracts (Fig. la and Supplementary Fig. 1c-e). 

We recently demonstrated that the essential centromere protein 
CENP-C directly recognizes the C terminus of CENP-A in mononucleo- 
somes but not in isolated CENP-A;/H4, tetramers’ (our unpublished 
observations). Therefore, we tested in vitro translated human and 
Xenopus laevis CENP-C for binding to reconstituted H3 and CENP-A 


chromatin. Human and Xenopus CENP-A are >50% identical (Sup- 
plementary Fig. 2a) and we find that both human and Xenopus CENP- 
C bind specifically to human CENP-A chromatin arrays in vitro, when 
compared to H3 chromatin arrays (Supplementary Fig. 2b). 
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Figure 1 | Reconstituted CENP-A chromatin supports centromere assembly 
in Xenopus egg extracts. a, A schematic showing the reconstitution of CENP-A 
and H3 chromatin arrays and the attachment of the chromatin to magnetic 
beads via biotin end-labelled DNA. b, Representative images comparing cenp-c 
binding to human CENP-A (HsCENP-A) and H3 chromatin arrays in CSF and 
interphase Xenopus extract. The left column shows the separate histone H4 
staining used for normalization of the quantification, followed by staining for 
DNA, human CENP-A and cenp-c. A merge image of the DNA (red) and cenp-c 
(green) channels is shown in the right column. Scale bar, 5 um. c, Quantification 
of the array-associated centromeric proteins cenp-c, cenp-n and cenp-k in CSF 
and interphase extracts, normalized to histone H4 levels. The levels are rescaled 
so that CENP-A arrays in CSF are set at 1. Error bars represent the standard error 
of the mean (s.e.m.), 1 = 3 (P< 0.05 between CENP-A and H3 chromatin arrays 
for cenp-c, cenp-n and cenp-k). 
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Xenopus egg extract is a widely used cell-free system to study chro- 
mosome segregation’®. Egg extracts are arrested in metaphase II of 
meiosis by the activity of cytostatic factor (CSF) and the cell-cycle state 
of the extract can be transitioned into interphase by adding calcium. 
We developed a quantitative immunofluorescence assay to determine 
whether centromere proteins bound to CENP-A chromatin arrays 
when arrays were added to Xenopus egg extracts. CENP-N and 
CENP-K are centromere proteins that are required for proper centro- 
mere and kinetochore assembly in somatic cells, and we have previ- 
ously shown that CENP-N, similar to CENP-C, directly binds to the 
CENP-A nucleosome®. We found that cenp-c, cenp-n and cenp-k 
specifically associated with CENP-A arrays independent of the cell- 
cycle stage of the extract (Fig. 1b, c and Supplementary Fig. 2c-f). The 
centromere protein cenp-t that binds to either H3 nucleosomes or 
DNA at centromeres did not selectively bind CENP-A chromatin 
arrays (Supplementary Fig. 3a, b)'’. Similarly, the inner centromere 
protein incenp and polo-like kinase 1 (plk1) associated with both types 
of chromatin arrays (Supplementary Fig. 3c). Xenopus incenp is tar- 
geted to chromatin through phosphorylation of both H2A and H3 and 
thus may have affinity for both CENP-A and H3 chromatin’? and 
plk1 associates with chromatin in Xenopus egg extract independent of 
the kinetochore”. Furthermore, reconstituted chromatin segments are 
unlikely to generate paired sister chromatids with inner centromeres 
because naked DNA and linear DNA replicates inefficiently in these 
egg extracts’. The specific recruitment of the centromere proteins 
cenp-c, cenp-n and cenp-k, however, indicates that reconstituted 
CENP-A chromatin arrays can support essential steps in the centro- 
mere assembly process in vitro. 

Functional kinetochores assemble on sperm chromatin in meta- 
phase Xenopus egg extract. At high sperm concentration, microtubule 
depolymerization causes mitotic checkpoint activation, resulting in the 
increased association of checkpoint proteins with kinetochores and 
cell-cycle arrest”. We tested whether reconstituted CENP-A chromatin 
arrays support kinetochore assembly and checkpoint protein binding 
after microtubule depolymerization. We added CENP-A or H3 arrays 
to CSF-arrested egg extracts and then cycled the extracts through inter- 
phase and back into mitosis, in the presence or absence of nocodazole, 
as outlined in Fig. 2a and demonstrated in Supplementary Fig. 4a. The 
constitutive centromere protein cenp-c and the microtubule-binding 
kinetochore protein ndc80 bound to CENP-A arrays in the presence 
or absence of nocodazole (Fig. 2b, c and Supplementary Fig. 4b). The 
spindle assembly checkpoint proteins cenp-e, mad2, rod (also known 
as kntcl) and zwl0 associated with CENP-A chromatin at inter- 
mediate levels in the absence of nocodazole but upon microtubule 
depolymerization their binding increased 2-4 fold (Fig. 2b). Western 
blot analysis showed that cenp-c and ndc80 are precipitated with 
CENP-A arrays independent of microtubule depolymerization. 
Xenopus zw10 and rod are enriched on CENP-A arrays upon nocoda- 
zole treatment in metaphase, regardless of whether the extract has been 
cycled through interphase (Fig. 2c). These results indicate that CENP-A 
chromatin arrays respond to microtubule depolymerization by recruit- 
ing mitotic checkpoint proteins (Fig. 2b, c and Supplementary Fig. 4b). 

Microtubule binding is a hallmark of kinetochore function and 
decondensed sperm chromatin efficiently supports spindle formation 
in egg extracts (Fig. 3a, left)**. However, chromatin assembled on naked 
DNA induces spindle formation in Xenopus egg extracts independent 
of kinetochores”. When we added CENP-A and H3 chromatin beads 
into mitotic egg extract we observed microtubule polymerization 
around the majority of CENP-A arrays but only around a subset of 
H3 arrays (Fig. 3a, left). We quantified the amount of microtubule 
polymer associated with each type of array and found significantly more 
microtubules associated with CENP-A chromatin beads (Fig. 3b and 
Supplementary Fig. 5a). This indicates that CENP-A chromatin pref- 
erentially stabilizes microtubules or promotes their polymerization. 
We observed heterogeneous microtubule structures around the 
CENP-A chromatin beads ranging from bipolar spindles to stabilized 
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Figure 2 | CENP-A chromatin specifically recruits kinetochore proteins as a 
response to a mimic of kinetochore detachment from microtubules. a, A 
schematic showing the experimental procedure. b, Quantification of 
immunofluorescence analysis of cenp-c, ndc80, cenp-e, mad2, rod or zw10 
recruitment to chromatin arrays with (+) and without (—) nocodazole (NOC). 
The levels are rescaled so that CENP-A arrays with nocodazole are set at 1. 
Error bars represent s.e.m., n = 3 (P< 0.05 between (—) and (+) nocodazole 
for cenp-e, mad2, rod and zw10 binding to CENP-A chromatin arrays). 

c, Western blot analysis of cenp-c, ndc80, rod and zw10 recruitment to CENP- 
A (HsCENP-A) and H3 chromatin arrays with and without nocodazole in CSF 
and cycled egg extracts. H4 levels are shown as a loading control. 


microtubules or microtubule bundles (Fig. 3a and Supplementary 
Fig. 5a, b). A second property of functional kinetochores is that 
kinetochore-associated microtubule bundles (k-fibres) are stable to cold 
treatment, which depolymerizes non-kinetochore microtubules. We 
asked whether kinetochores assembled on CENP-A chromatin could 
stabilize microtubules to cold shock by incubating the microtubule 
assembly reactions for 10min at 4°C. We found that kinetochores 
assembled on CENP-A chromatin arrays stabilized microtubules to 
cold shock similar to kinetochores assembled on native sperm chro- 
matin whereas H3 chromatin arrays did not (Fig. 3a, c and Sup- 
plementary Fig. 5c). When we completely depolymerized microtubules 
with nocodazole we observed mad2 recruitment to native sperm 
centromeres and CENP-A chromatin beads but not H3 chromatin 
beads (Fig. 3a, c and Supplementary Fig. 5c). These results indicate that 
CENP-A chromatin arrays, similar to native sperm chromatin, 
assemble functional kinetochores that promote microtubule binding, 
k-fibre stabilization and spindle checkpoint function (Fig. 3a). 

In cells, unattached kinetochores activate the mitotic checkpoint and 
delay mitotic exit until all chromosomes are properly attached and 
aligned’*’’. We tested whether kinetochores assembled on CENP-A 
chromatin arrays could generate a mitotic checkpoint response to 
microtubule depolymerization and delay the cell cycle. We mixed 
CENP-A and H3 chromatin with CSF extracts, cycled the reactions 
through interphase and then cycled them back into mitosis in the 
presence or absence of nocodazole (Fig. 2a). We then released the 
extract from mitosis into interphase a second time and monitored 
the kinetics of this transition by measuring the mitosis-specific phos- 
phorylation of weel (phospho-wee1) (Fig. 3d). On release from mitosis, 
phospho-weel levels rapidly declined and were undetectable after 
30 min in control extracts containing CENP-A chromatin or H3 chro- 
matin, as well as in extracts containing H3 chromatin in the presence of 
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Figure 3 | Kinetochores assembled on reconstituted CENP-A chromatin 
bind microtubules and generate a mitotic checkpoint signal. 

a, Representative images of microtubule polymerization induced by sperm or 
reconstituted CENP-A and H3 chromatin. Microtubules (green) and mad2 
(magenta) levels are shown. Scale bar, 10 um. b, Quantification of tubulin and 
DNA associated with CENP-A and H3 chromatin beads. Error bars represent 
s.e.m., n = 5. ¢, Quantification of tubulin and mad2 levels associated with 
CENP-A and H3 chromatin beads after cold shock (4 °C) and nocodazole 
(NOC) treatment. Error bars represent s.e.m., n = 5. d, Western blot showing 
phospho-weel (p-wee1) levels as an indicator of the cell-cycle stage and tubulin 
levels as a loading control. Samples from different time points after release from 
mitotic arrest are shown for CENP-A and H3 chromatin arrays, each incubated 
with nocodazole (+) or with DMSO (—) asa control. e, Quantification of four 
independent experiments showing the phospho-weel signal intensity (p-weel 
signal) over time (min). Error bars represent s.e.m., n = 4. 


nocodazole (Fig. 3d, e). In extracts containing CENP-A chromatin 
and nocodazole, the phospho-wee1 signal increased until 20 min after 
calcium addition and subsequently declined until 40 min after calcium 
addition to a level only slightly lower than that before release (Fig. 3d, e). 
In the presence of CENP-A chromatin and nocodazole, cyclin B levels 
rapidly declined but then stabilized, similar to the response observed for 
native sperm chromatin”. However, cyclin B was not stabilized in the 
presence of H3 chromatin and nocodazole (Supplementary Fig. 5d, e). 
We estimate that the number of CENP-A nucleosomes we are adding to 
the egg extract exceeds the CENP-A nucleosome concentration required 
to activate the checkpoint using sperm nuclei”. The lower efficiency of 
reconstituted arrays for checkpoint signalling may be due to the com- 
paratively short length of our reconstituted CENP-A chromatin to 
native CENP-A chromatin or the lack of replicated sister chromatids 
and inner centromeres important for tension-dependent checkpoint 
activation. Despite these differences, our synthetic CENP-A chromatin 
supports a mitotic checkpoint response that mimics the response of 
native kinetochores to microtubule depolymerization. 

The reconstituted chromatin system we have developed provides a 
distinct experimental advantage over native metazoan centromeric 
chromatin because the chromatin template can be easily manipulated 
to dissect the roles of histone proteins in centromere function. A 
central question in centromere function is how CENP-A chromatin 
directs the assembly of the centromere and kinetochore. CENP-N 
recognizes the CATD region of the CENP-A nucleosome while 
CENP-C binds the C-terminal tail of CENP-A®®. However, the relative 
importance of these two recognition mechanisms in centromere and 
kinetochore assembly is incompletely understood. 
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We generated chromatin arrays containing chimaeric CENP-A/H3 
proteins to ask how the CENP-A CATD domain and the CENP-A 
C terminus influence centromere and kinetochore assembly (Fig. 4a). 
We characterized the level of histone exchange and/or loss from the 
arrays during incubation in extracts and found that the majority of 
recombinant human CENP-A nucleosomes were stable during the 
incubation, indicating low exchange and/or loss rates (Supplemen- 
tary Fig. 6a, b). We detected a low level of phosphorylated histone 
H3 on CENP-A chromatin arrays in CSF extract (11.7% + 7% com- 
pared to H3 arrays) and in extract that had been cycled through inter- 
phase and back into mitosis (22% + 13% compared to H3 arrays) 
(Supplementary Fig. 6c, d). The chimaeric arrays containing CENP-A 
with the histone H3 tail (CENP-A + H3C) exhibited similar levels of 
exchange (Supplementary Fig. 6c, d). The Xenopus cenp-a present in 
the extract did not appreciably exchange onto any of the arrays (detec- 
tion limit ~5-10% exchange) (Supplementary Fig. 6c). The absence of 
gross rearrangements or bulk histone exchange suggests that chro- 
matin arrays can be used to dissect how individual domains of 
CENP-A influence kinetochore assembly. 
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Figure 4 | The CENP-A C terminus is required for centromere and 
kinetochore assembly in Xenopus egg extract. a, A schematic showing the 
different CENP-A/H3 chimaeras used in this study. The numbers at the top 
represent the amino acid (aa) within human CENP-A. b, Quantification of 
immunofluorescence analysis of cenp-c, cenp-k and cenp-n recruitment to 
wild-type and chimaeric arrays. The relative amounts of each centromere 
protein bound to the arrays are shown relative to CENP-A arrays set to 1. Error 
bars represent s.e.m., 1 = 3 (P = 0.05 for all proteins binding to CENP-A arrays 
compared to chimaeric arrays except for the H3 arrays containing the CENP-A 
C terminus). c, Quantification ofimmunofluorescence analysis of ndc80, cenp-e, 
mad2 recruitment to chimaeric chromatin arrays with (+) and without (—) 
nocodazole (NOC). Values are displayed relative to CENP-A arrays in the 
presence of nocodazole set to 1. Error bars represent s.e.m., n = 4. The 
efficiencies of recruitment of kinetochore proteins to CENP-A and H3 + CAC 
arrays in nocodazole were not statistically distinguishable (P = 0.26 for ndc80, 
cenp-e and mad2). d, Quantification of microtubule binding to CENP-A, H3, 
H3 + human CAC (HsCAC) and H3 + Xenopus CAC (XICAC) chromatin 
arrays represented as percentage of beads associated with tubulin levels above 
threshold (dark grey bars, left y-axis). Average DNA levels on chromatin beads 
are shown representing the levels of chromatin arrays bound to beads (light grey 
bars, right y-axis). Error bars represent s.e.m., n = 4 for CENP-A and H3 arrays, 
n=5 for H3 + human CAC arrays and n = 2 for H3 + Xenopus CAC arrays. 
e, Western blot analysis shows phospho-weel (p-wee1) levels as an indicator of 
the cell-cycle stage at 0 min and 40 min after mitotic exit. Tubulin levels are 
shown as a loading control. f, Quantification of the phospho-weel signal 
intensity over time. Error bars represent s.e.m., n = 5. 
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Using our in vitro centromere and kinetochore assembly assay, we 
found that cenp-c bound with equal efficiency to chromatin arrays 
assembled with either wild-type CENP-A or with chimaeras of histone 
H3 with the CENP-A C-terminal six amino acids (H3 + CAC) but not 
CENP-A + H3C (Fig. 4b and Supplementary Fig. 7a, left). This 
demonstrates that the CENP-A C terminus is necessary and sufficient 
for recruiting cenp-c to CENP-A chromatin arrays in egg extracts, as it 
is for CENP-A mononucleosome binding in vitro’. 

Xenopus cenp-k depends on cenp-c for its association with sperm 
centromeres and cenp-k also associated with the wild-type and 
H3 + CAC arrays (Fig. 4b and Supplementary Fig. 7a). Surprisingly, 
we found that H3 + CAC arrays recruited cenp-n as efficiently as 
wild-type CENP-A arrays, even though these arrays lack the CATD 
recognition element for CENP-N°. Xenopus cenp-n binding to either 
CENP-A + H3C or H3 + CATD arrays was no better than its binding 
to H3 chromatin arrays, indicating that the CENP-A C terminus is 
required for cenp-n association with CENP-A chromatin in Xenopus 
egg extract (Fig. 4b and Supplementary Fig. 7a). The lack of Xenopus 
cenp-n binding to H3+CATD and CENP-A+ H3C chromatin 
arrays is not due to species differences because Xenopus cenp-n binds 
human CENP-A mononucleosomes in vitro in the absence of CENP-C 
(Supplementary Fig. 7b). The association of cenp-n and cenp-k with 
chromatin arrays was dependent on cenp-c, as cenp-c depletion from 
the extract (Supplementary Fig. 8a) reduced the binding to back- 
ground levels (Supplementary Fig. 8b, c). This was not due to depletion 
of cenp-n or cenp-k by cenp-c, as we have previously shown that 
complementation of cenp-c-depleted extracts restores cenp-k binding 
and CENP-K is known to depend on CENP-N for its centromere 
localization®’*”**°, The dependence of CENP-N on CENP-C for its 
localization to CENP-A arrays may reflect a role for CENP-C in alter- 
ing the geometry of centromeric chromatin to promote access of 
CENP-N to CENP-A nucleosomes, or it may reflect the assembly of 
CENP-N into the larger CCAN complex recruited to the centromere 
via CENP-C. Our results demonstrate that cenp-c recognition of the 
CENP-A C terminus is necessary and sufficient for cenp-n and cenp-k 
association with chromatin arrays in Xenopus egg extract. 

We analysed the chromatin requirements for mitotic kinetochore 
formation using the experimental strategy illustrated in Fig. 2a. The 
kinetochore proteins ndc80, cenp-e, mad2, rod and zw10 are efficiently 
recruited to wild-type and H3 + CAC chromatin arrays, but not to 
CENP-A + H3C or H3+CATD chromatin arrays (Fig. 4c, Sup- 
plementary Fig. 9a and Supplementary Fig. 10a). Similar to wild-type 
CENP-A chromatin, only the checkpoint proteins cenp-e, mad2, zw10 
and rod increased in their association with H3 + CAC after micro- 
tubule depolymerization (Fig. 4c, Supplementary Fig. 9a and Sup- 
plementary Fig. 10a). As with wild-type CENP-A arrays, the 
H3 + CAC arrays showed increased associated microtubule polymer 
indicating that the C terminus of CENP-A directs the formation of 
microtubule binding or stabilization activity (Fig. 4d). Human and 
Xenopus CENP-A differ by two amino acids in their C-terminal tail 
(Supplementary Fig. 2a) and chimaeric nucleosome arrays containing 
the Xenopus C-terminal tail of cenp-a fused to H3 (H3 + Xenopus 
CAC) were equally efficient in cenp-c recruitment and microtubule 
binding as human H3 + CAC arrays (Fig. 4d and Supplementary Fig. 10b); 
indicating that the mode of interaction between CENP-C and CENP-A 
is conserved. 

We assayed the ability of chimaeric nucleosome arrays to promote 
mitotic checkpoint arrest after microtubule depolymerization and 
found that H3 + CATD and CENP-A + H3C did not delay the exit 
from mitosis but that H3 + CAC did (Fig. 4e, f). The delay of mitotic 
exit caused by H3 + CAC arrays was less effective than that of CENP-A 
chromatin arrays, indicating that regions of CENP-A in addition to the 
C terminus increase the effectiveness of checkpoint signalling, possibly 
by stabilizing CCAN and kinetochore protein interactions with 
chromatin (Fig. 4e, f). Taken together, our data demonstrate that 
the primary chromatin determinant for functional centromere and 
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kinetochore assembly is the C terminus of CENP-A and its recognition 
by CENP-C. 

Here we have shown that reconstituted CENP-A chromatin, in the 
absence of native centromeric DNA, is necessary and sufficient for 
centromere and kinetochore assembly. Our data imply that short 
domains of CENP-A chromatin are sufficient for assembling core com- 
ponents of the centromere and kinetochore in the absence of higher- 
order organization of centromeric chromatin and interspersed domains 
of H3 chromatin. 

Using our in vitro system, we have directly assessed how domains of 
CENP-A participate in centromere and kinetochore assembly, even 
when the mutations we analyse would be expected to be lethal in vivo. 
We find that the CENP-A C terminus is both necessary and sufficient for 
the recruitment of centromere and kinetochore proteins, for microtubule 
binding and for a checkpoint response to microtubule depolymerization. 
We suggest that CENP-A performs two functions that can be separated 
molecularly: (1) the CENP-A CATD provides a recognition mechanism 
for targeting of CENP-A to centromeres to maintain centromeric chro- 
matin?**; and (2) the CENP-A C-terminal tail domain recruits the 
conserved centromere protein CENP-C to promote centromere and 
kinetochore assembly. We envision the use of more complex chromatin 
templates to understand the importance of higher-order chromatin 
organization and regulatory modifications in centromere assembly 
and function. 


METHODS SUMMARY 


Histone proteins and chimaeras were purified as described previously**!* and 
assembled onto a biotin end-labelled tandem array of 19 high-affinity nucleosome 
positioning sequences (19X601) by salt dialysis. Chromatin arrays were bound to 
streptavidin-coated magnetic Dynabeads (Invitrogen). X. laevis extracts were pre- 
pared as previously described’® and centromere protein binding to chromatin 
arrays was performed in freshly prepared CSF egg extract for 1 h with or without 
calcium addition. Arrays were fixed in formaldehyde and stained for centromere 
proteins by indirect immunofluorescence. Kinetochore and checkpoint protein 
assembly was assayed by adding arrays to extracts released into interphase with 
calcium for 80 min followed by re-addition of CSF extract in the presence or 
absence of nocodazole (10 Lg ml ~ ') for another 90 min. To analyse microtubule 
binding, chromatin arrays were incubated in CSF for 90 min. Reactions were 
sedimented through a glycerol cushion onto a coverslip followed by tubulin immu- 
nofluorescence. Chromatin-array-dependent inhibition of mitotic exit was 
assayed as described for kinetochore protein binding, but calcium was added a 
second time to release extracts into interphase. The cell-cycle state was monitored 
by western blotting using anti-phospho-weel antibody, provided by J. E. Ferrell. 

Images were collected as 13 axial planes at 2 tm intervals on a Nikon Eclipse-80i 
microscope using a X60, 1.4 NA PlanApo oil lens anda CoolSnapHQ CCD camera 
(Photometrics) with MetaMorph software (MDS Analytical Technologies). Axial 
stacks were maximum intensity projected and quantified using custom software. 
For normalization of each experiment, a separate histone H4 staining was per- 
formed to quantify the exact array coupling efficiency. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Histone expression. CENP-A/H4 and H3/H4 wild-type and chimaeric tetramers, as 
wellas H2A and H2B dimers were expressed and purified as described previously**'**?. 
Preparation of biotinylated array DNA. A tandem array of 19 copies of the high- 
affinity nucleosome positioning sequence (19X601)'**? was digested with EcoRI, 
Xbal, Dral and HaelI (NEB) overnight to excise the 19-nucleosome positioning 
sequence array and to digest the remaining backbone DNA to smaller DNA 
fragments. The array DNA was then purified by PEG precipitation and dialysed 
against 10 mM Tris-HCl pH 8.0, 0.25 mM EDTA as previously described". 

The array DNA was end labelled with biotin by end filling the EcoRI and Xbal 

sites using Klenow DNA polymerase for 4h at 37 °C ina reaction containing 35 uM 
Biotin-14-dATP (Invitrogen), «-thio-dTTP and o,-thio-dGTP (Chemcyte) and 
dCTP. The labelled DNA was then purified using a PCR fragment purification kit 
(Qiagen). The biotinylation efficiency was determined by adding FITC-streptavidin 
(final concentration of 10 pg ml” ') to 500 ng of purified array DNA and monitoring 
the fraction of gel-shifted DNA after migration in a 0.7% agarose gel. 
Chromatin array assembly. To assemble chromatin arrays, biotinylated DNA, 
CENP-A/H4 or H3/H4 tetramers and H2A/H2B dimers were mixed at a stochio- 
metry of 1:1:2.2 or 1:0.9:2.2, respectively, in high-salt buffer (10 mM Tris-HCl pH 
7.5, 0.25mM EDTA, 2M NaCl) and then dialysed into low-salt buffer (10 mM 
Tris-HCl pH 7.5, 0.25 mM EDTA, 2.5 mM NaCl) over 60-70 h at 4 °C. Final array 
DNA concentration typically was 0.15 mgml ' to 0.2 mgml’. 

To assess the efficiency of nucleosome assemblies, arrays were digested at room 
temperature (approximately 22 °C) overnight with Aval in a low-magnesium buffer 
(50 mM potassium acetate, 20 mM Tris-acetate, 0.5 mM magnesium acetate, 1 mM 
dithiothreitol, pH 7.9). Digested chromatin arrays were supplemented with glycerol 
(20% final concentration) and separated on a native 5% acrylamide gel in 0.5 Tris/ 
Borate/EDTA buffer for 80 min at 10 mA. Gels were stained with EtBr (1 pg ml — | 
to visualize DNA. 

Coupling of biotinylated chromatin arrays to Dynabeads. Biotinylated chro- 
matin arrays were coupled to prewashed streptavidin-coated magnetic Dynabeads 
(Invitrogen) at a ratio of 10 4g DNA to 1 mg beads in 50 mM Tris-HCl pH 8.0, 
75 mM NaCl, 0.25 mM EDTA, 2.5% polyvinyl alcohol (PVA) and 0.05% Triton- 
X-100 for 1-2h. The beads were then equilibrated in 75 mM Tris-HCl pH 8.0, 
75mM NaCl, 0.25mM EDTA, 0.05% Triton-X-100 and either used directly or 
stored at 4 °C for later use. 

X. laevis egg extracts. X. laevis CSF extracts were prepared as previously 
described’***. To assess the binding of centromeric proteins to chromatin arrays in 
CSF and interphase egg extracts, chromatin arrays were mixed with freshly prepared 
CSF egg extract with or without CaCl, (final concentration 0.6 mM) at a nucleosome 
concentration of ~100 nM unless stated otherwise. The reactions were incubated for 
Lhat4 °C or at 16-20 °C in a water bath, the arrays were re-isolated from extracts by 
exposure to a magnet and then washed three times in 1X CSF-XB buffer (10 mM 
HEPES pH 7.7, 2mM MgCh, 0.1mM CaCl, 100mM KCl, 5mM EGTA, 50mM 
sucrose) supplemented with 0.05% Triton-X-100. Chromatin arrays were fixed in 
CSE-XB buffer, 0.05% Triton-X-100, 2% formaldehyde for 5 min. After fixation, chro- 
matin arrays were washed into antibody dilution buffer (20 mM Tris-HCl pH 7.5, 
150 mM NaCl, 0.1% Triton-X-100, 2% BSA) and analysed by immunofluorescence. 

Kinetochore and spindle checkpoint protein assembly were analysed by mixing 
chromatin arrays with CSF extract and CaCl, (final concentration 0.6 mM). 
Reactions were incubated at 16-20 °C for 80 min to allow extracts to release into 
interphase and mixed every 15 min. One volume of fresh CSF extract was added 
together with nocodazole (or DMSO) at 10 pg ml * and samples were held at 16- 
20°C for another 90 min. After 170 min total incubation time, samples for immu- 
nofluorescence analysis were washed and fixed as described above. 

The cell-cycle state was verified by loading 2 11] extract of all relevant time points 
onto SDS-PAGE, followed by western blotting using the anti-phospho-weel 
antibody™. 

To assess the ability of chromatin arrays to inhibit mitotic exit, arrays were 
mixed with CSF extract and CaCl, (final concentration: 0.6 mM). The samples 
were incubated for 80 min to induce the release into interphase. In the next step, 
one volume of fresh CSF extract, supplemented with nocodazole/DMSO, was 
added to cycle the extract back into a mitotic arrest. After 90 min, CaCl, was added 
again to release the extract from mitotic arrest. Western blot samples were taken at 
all indicated time points and processed as described. 

To analyse microtubule binding by CENP-A and H3 chromatin arrays, chro- 
matin arrays were mixed with CSF extract and incubated for 90 min at 18-20 °C. 
During incubation samples were mixed every 15 min. Reactions were fixed for 
10 min in 2.5% formaldehyde, sedimented through a glycerol cushion onto cover- 
slips and post-fixed for 5 min in ice-cold methanol followed by immunofluores- 
cence analysis*’. To assay for mad2 levels and microtubule stabilization, reactions 
were either supplemented with nocodazole at a final concentration of 10 1g ml! 
or shifted to 4°C for 10 min after the 90 min incubation time. 


Immunodepletion. Depletion of Xenopus cenp-c from Xenopus egg extracts was 
performed as described previously”*. 

Cloning and antibody generation. The X. laevis cenp-n cDNA clone (GenBank 
accession number BC084956) was purchased from American Type Culture Collection. 
Peptides against Xenopus cenp-n (acetyl-CPHKARNSFKITEKR-amide) were synthe- 
sized by Bio-Synthesis and peptide antibodies were generated as previously described”. 
Immunofluorescence. For immunofluorescence analysis, fixed chromatin arrays 
were bound to poly--lysine-coated acid-washed coverslips. The following primary 
antibodies were used for immunofluorescence staining and typically incubated at 
4°C overnight: anti-human CENP-A”® was directly coupled to Alexa 647 
(Molecular Probes), anti-H4 (Abcam), anti-Xenopus cenp-c, anti-Xenopus cenp- 
e, anti-Xenopus cenp-k and anti-Xenopus cenp-n and anti-tubulin (Dm14; Sigma). 
Rabbit antibodies were generated against the full-length Xenopus polo kinase 
made in Sf9 cells and a GST fusion to the first 379 amino acids of Xenopus incenp 
made in E. coli. The anti-mad2 antibody was provided by A. Murray (Harvard 
University), and R.-H. Chen (Institute of Molecular Biology, Academia Sinica), the 
anti-Xenopus zw10 and anti-Xenopus rod antibodies were provided by G. Kops 
(University Medical Center Utrecht) and the anti-Xenopus ndc80 antibody was 
provided by P. Todd Stukenberg (University of Virginia). Alexa-conjugated sec- 
ondary antibodies were used at 1 1g ml‘ (Molecular Probes). Propidium iodide at 
1pgml' or Hoechst at 10 pg ml! was used to visualize DNA. 

Microscopy and analysis. Images were collected on a Nikon Eclipse 80i micro- 
scope using a X60, 1.4 NA Plan Apo VC oil immersion lens, a Sedat Quad filter set 
(Chroma Technology) using MetaMorph software (MDS Analytical Technologies) 
and a charge-coupled device camera (CoolSnapHQ; Photometrics). Thirteen axial 
planes at 2 1m intervals were acquired with an MFC-2000 Z-axis drive (Applied 
Scientific Instrumentation). Axial stacks were maximum intensity projected and 
then quantified using custom software (Matlab) to identify beads in each image and 
to quantify the integrated intensity for each channel after background subtraction. 
Briefly, the propidium iodide stained (DNA) channel was used to find beads. Bead 
centroids were found by filtering the image using a structuring element that had a 
peak at a 17 pixel radial distance from the structuring element centre, correspond- 
ing to the bright ring seen around the edges of the beads. A 35 pixel diameter circle 
around the centroid of each bead identified was used as the region of interest for that 
bead. After beads were identified, regions of interest were transferred automatically 
to the remaining channels and the integrated signal intensity was calculated for each 
bead in each channel, normalized to the area of the bead region (which was uniform 
except in cases of partially overlapping beads), and background corrected using an 
average of three bead-sized regions manually chosen to be away from any beads. For 
each experiment, at least three images per coverslip were acquired and 20-300 
beads were analysed per image. For the normalization of each experiment, a sepa- 
rate histone H4 staining was performed to quantify the exact coupling efficiency for 
each type of chromatin array and for each experiment. 

Immunofluorescence microscopy images of the microtubule binding assays that 
were subjected to deconvolution were acquired with an Olympus [X70 microscope. 
The microscope was outfitted with a Deltavision Core system (Applied Precision) 
using an Olympus X60 1.4NA Plan Apo lens, a Sedat Quad filter set (Semrock) anda 
CoolSnap HQ CCD Camera (Photometrics). The microscope was controlled via 
softworx 4.1.0 software (Applied Precision) and images were deconvolved using 
softworx v. 4.1.0 (Applied Precision). Microtubule quantification was performed 
using a modification of the same software used for centromere protein quantification. 
Immunoblotting. Western blot samples were separated by SDS-PAGE and trans- 
ferred onto PVDF membrane (Bio-Rad) in CAPS transfer buffer (10 mM 
3-(cyclohexylamino)-1-propanesulfonic acid, pH 11.3, 0.1% SDS and 20% meth- 
anol). The following primary antibodies were typically incubated overnight at 
4°C: anti-Xenopus cenp-c*, anti-tubulin (Dmla, Sigma), anti-H4 (Abcam), 
anti-phospho H3 (Ser10) (Millipore), anti-phospho-weel. The anti-phospho-weel 
antibody was provided by J. E. Ferrell (Stanford University)**. For additional primary 
antibodies, western blot samples were transferred onto PVDF membrane (Bio-Rad) 
in 20 mM Tris-Base, 200 mM glycine. Alexa fluorophore conjugated anti-rabbit or 
anti-mouse secondary antibodies (Molecular Probes) were used according to 
manufacturer’s specification. Fluorescence was detected on a Typhoon 9400 
Variable Mode Imager (Amersham Biosciences) and quantified using ImageJ 
(http://rsb.info.nih.gov/ij/). Actin antibodies were provided by J. Theriot (Stanford 
University) and anti-cyclin B was purchased from Santa Cruz Biotechnology. 

In vitro binding of centromere proteins to chromatin arrays. Human and 
Xenopus CENP-C were in vitro translated (IVT) in rabbit reticulocyte extracts 
in the presence of 10 mCi ml ! [°°S]methionine (Perkin Elmer) using the TnT 
Quick-Coupled Transcription/Translation system (Promega) according to the 
manufacturer’s instructions. For a binding reaction (60 ll total volume), 5 pl of 
each IVT protein were mixed with chromatin arrays in bead buffer (75 mM Tris- 
HCl pH 7.5, 50mM NaCl, 0.25mM EDTA, 0.05% Triton-X-100). The final 
nucleosome concentration per reaction was 60 nM. Reactions were incubated at 
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4°C for 1h. The beads were washed three times with bead buffer and resuspended 
in 4X SDS loading buffer. Samples were separated on a SDS-PAGE, Coomassie 
stained and after drying scanned using a phosphorimager (Typhoon 4200, 
Amersham Biosciences) and quantified using ImageJ (http://rsb.info.nih.gov/ij/). 
Statistical analysis. In each experiment, the relative levels of proteins associated 
with the chromatin arrays were normalized to values for wild type CENP-A arrays 
set to 1. For calculation of P values each data set was anchored at 1 and then log 
transformed followed by calculation of P values using a Student’s t-test*’. 
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Antibiotic resistance is ancient 


Vanessa M. D’Costa’**, Christine E. King***, Lindsay Kalan’, Mariya Morar!?, Wilson W. L. Sung", Carsten Schwarz’, 
Duane Froese®, Grant Zazula°, Fabrice Calmels”, Regis Debruyne’, G. Brian Golding*, Hendrik N. Poinar’*** & Gerard D. Wright"? 


The discovery of antibiotics more than 70 years ago initiated a 
period of drug innovation and implementation in human and 
animal health and agriculture. These discoveries were tempered 
in all cases by the emergence of resistant microbes’. This history 
has been interpreted to mean that antibiotic resistance in patho- 
genic bacteria is a modern phenomenon; this view is reinforced by 
the fact that collections of microbes that predate the antibiotic era 
are highly susceptible to antibiotics’. Here we report targeted 
metagenomic analyses of rigorously authenticated ancient DNA 
from 30,000-year-old Beringian permafrost sediments and the 
identification of a highly diverse collection of genes encoding res- 
istance to f-lactam, tetracycline and glycopeptide antibiotics. 
Structure and function studies on the complete vancomycin resist- 
ance element VanA confirmed its similarity to modern variants. 
These results show conclusively that antibiotic resistance is a 
natural phenomenon that predates the modern selective pressure 
of clinical antibiotic use. 

Recent studies of modern environmental and human commensal 
microbial genomes have a much larger concentration of antibiotic 
resistance genes than has been previously recognized*®. In addition, 
metagenomic studies have revealed diverse homologues of known 
resistance genes broadly distributed across environmental locales. 
This widespread dissemination of antibiotic resistance elements is 
inconsistent with a hypothesis of contemporary emergence and 
instead suggests a richer natural history of resistance’. Indeed, 
estimates of the origin of natural product antibiotics range from 
2 Gyr to 40 Myr ago”®, suggesting that resistance should be similarly 
old. Previous publications claim to have cultured resistant bacteria 
from Siberian permafrost (for example ref. 9), but these results remain 
contentious (see Supplementary Information). 

To determine whether contemporary resistance elements are modern 
or whether they originated before our use of antibiotics, we analysed 
DNA sequences recovered from Late Pleistocene permafrost sediments. 
The samples were collected east of Dawson City, Yukon, at the Bear 
Creek (BC) site (Fig. 1); prominent forms of ground ice (ice wedges and 
surface icings) are preserved in the exposure, immediately overlain by a 
distinctive volcanic ash layer, the Dawson tephra'®"' (Supplementary 
Table 1 and Supplementary Figs 1 and 2). The tephra has been dated at 
several sites in the area to about 25,300 radiocarbon (40) years BP, or 
about 30,000 calendar years’””*. The cryostratigraphic context is similar 
to other sites in the area preserving relict permafrost and indicates that 
the permafrost has not thawed since the time of deposition (Sup- 
plementary Information). In the absence of fluid leaching, the site repre- 
sents an ideal source of uncontaminated and securely dated ancient 
DNA. 

Two frozen sediment cores (BC1 and BC4), 10cm apart, were 
obtained 50cm below the tephra. In accordance with appropriate 
protocols’’, we monitored contamination introduced during coring 
by spraying the drilling equipment and the outer surface of the cores 


with high concentrations of Escherichia coli harbouring the gfp (green 
fluorescent protein) gene from Aequorea victoria (Supplementary 
Information). 

After fracturing of the samples (Supplementary Fig. 3), total DNA 
was extracted from a series of five subsamples taken along the radius of 
each core (Supplementary Information). Quantitative polymerase 
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Figure 1 | Stratigraphic profile and location of Bear Creek site. Elevation is 
given in metres above base of exposure. Permafrost samples from below 
Dawson tephra were dated to about 30 kyr bp. Preservation of the ice below and 
above the sample indicates that the sediments have not thawed since 
deposition. Silhouettes represent mammals and birds identified from ancient 
DNA sequences that are typical of the regional Late Pleistocene environment. 
aDNA, ancient DNA. 
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chain reaction (qPCR) analysis confirmed extremely high yields of gfp 
on both core exteriors, with 0.1% or less of this amount at the centre 
(Supplementary Information and Supplementary Fig. 4). This sup- 
ports negligible leaching or cross-contamination during subsampling. 

A crucial step lending support for the authenticity of the ancient 
DNA was to confirm the presence of DNA derived from flora and fauna 
characteristic of a late Pleistocene age, and the absence of common 
modern or Holocene floral and faunal sources. To explore the vertebrate 
and plant diversity, we amplified fragments of the mitochondrial 12S 
rRNA and chloroplast trnL and rbcL genes (Supplementary Table 3). 
Amplicons were sequenced with the 454 GS-FLX platform and iden- 
tified by BLAST analysis of GenBank sequences (Supplementary 
Information). 

The vertebrate sequences included abundant Late Pleistocene 
megafauna such as Bison, Equus and Ovis, as well as rodents (Microtus 
and Ellobius) and the rock ptarmigan, Lagopus mutus (Supplementary 
Fig. 6 and Supplementary Table 5). Mammuthus was detectable at low 
copy numbers with the use of a mammoth-specific qPCR assay, which is 
consistent with the low ratio of these fossils relative to bison and horse in 
the region'"*. The rbcL and trnL sequences revealed many plant groups 
that are also well documented in Beringia, including the grasses Poa and 
Festuca, sage (Artemisia) and willow (Salix)'° (Supplementary Figs 7 and 
8, and Supplementary Tables 6 and 7). No sequences of common 
Holocene vertebrates (for example elk or moose) or plants (for example 
spruce) were identified despite sequence conservation across the primer- 
binding sites; these results are consistent with other reports’® that have 
argued against DNA leaching in permafrost sediments. 

We focused our investigation of bacterial 16S rRNA sequences on 
the Actinobacteria, known for their ability to synthesize diverse 
secondary metabolites and for harbouring antibiotic resistance genes’. 
Deep sequencing of 16S amplicons (Supplementary Information) 
revealed genera commonly found in soil and permafrost microbial 
communities!’, including Aeromicrobium, Arthrobacter and Frankia 
(Supplementary Fig. 9 and Supplementary Table 8). Analysis of con- 
taminant 16S sequences derived from extraction and PCR control re- 
actions (Supplementary Table 4) suggested that these do not 
contribute to the ancient DNA data set; in fact not only were the copy 
numbers 1,000-30,000-fold lower than from the permafrost extracts, 
but with the exception of unclassified bacteria there was also very 
little overlap in the genera identified (Supplementary Fig. 9 and 
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Supplementary Table 8). Querying the permafrost sequences against 
the contaminant data set with the use of BLAST further confirmed 
their disparity: only 1% of the reads had 95-100% identity to a con- 
taminant sequence, with a single sequence showing 100% identity. 

We next developed a series of assays to detect genes encoding resist- 
ance to several major classes of antibiotic and representing diverse 
strategies of drug evasion (for example target modification, target pro- 
tection and enzymatic drug inactivation) (Supplementary Information). 
Determinants included the ribosomal protection protein TetM, which 
confers resistance to tetracycline antibiotics by weakening the inter- 
action between the drug and the ribosome; the p-Ala-p-Ala dipeptide 
hydrolase VanX, which is a component of the vancomycin resistance 
operon; the aminoglycoside-antibiotic-modifying acetyltransferase 
AAC(3); a penicillin-inactivating B-lactamase Bla (a member of the 
TEM group of f-lactamases); and the ribosome methyltransferase 
Erm, which blocks the binding of macrolide, lincosamide and type B 
streptogramin antibiotics. Amplification of vanX, tetM and bla frag- 
ments was successful, and triplicate PCR products from multiple 
extracts were cloned and multiple clones were sequenced. 

The f-lactamase sequences demonstrated amino-acid identities 
between 53% and 84% with known determinants and clustered with 
one of two groups of enzymes: characterized B-lactamases from strepto- 
mycetes and uncharacterized f-lactamase-like hydrolytic proteins 
(Fig. 2a and Supplementary Fig. 14). We identified several tetM-related 
genes in the permafrost, most of which were most closely related to the 
actinomycete subset of ribosomal protection proteins, including the 
biochemically characterized self-resistance element OtrA from the 
oxytetracycline producer Streptomyces rimosus'* (Fig. 2b). Most intri- 
guing was the identification of vanX gene fragments, which spanned 
the entire phylogenetic space of characterized vancomycin resistance 
determinants found in the clinic and in the environment. These branch 
away from the cellular dipeptidases that are the likely progenitors the 
vanX family (Supplementary Fig. 10). 

Vancomycin resistance took the clinical community by surprise 
when it emerged in pathogenic enterococci in the late 1980s’”. In both 
clinical pathogens”’ and contemporary soil environments’, resistance 
results from the acquisition of a three-gene operon vanH-vanA-vanX 
(vanHAX). These enzymes collectively reconstruct bacterial peptido- 
glycan to terminate in D-alanine-D-lactate in place of the canonical 
D-alanine-D-alanine, which is required for vancomycin binding and 
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Figure 2 | Genetic diversity of ancient antibiotic resistance elements. 

a, b, Unrooted Bayesian phylogenies of translated B-lactamase (bla) (a) and 
tetracycline resistance (tetM) (b). Blue denotes predicted resistance enzymes, 
and green those associated with other functions; permafrost-derived sequences 
are labelled with the originating core name. Sequences in which resistance 
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activity has been biochemically verified are noted with a single asterisk 
(Supplementary Information). The scale bar represents 0.1 substitutions per 
site. Posterior probabilities are shown for a, and those of 0.7 or more are 
indicated for b. All unlabelled tips derive from ancient sequences. BC1, Bear 
Creek sample 1; BC4, Bear Creek sample 4. 
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subsequent antibiotic action. Although most forms of resistance are 


attributed to a single gene, this complex mechanism is exclusively 


associated with resistance and thus its presence provides unambiguous 
confirmation of its role as a resistance determinant. 
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Figure 3 | Ancient vancomycin resistance elements. a, vanHAX amplicons 
used in this study, with primer names noted above each arrow. b, Unrooted 
Bayesian phylogeny of translated vanA sequences; blue denotes strains with 
vanHAX clusters confirmed to confer resistance; sequences containing stop 
codons but homology throughout are noted with a single asterisk 
(Supplementary Information). BC1, Bear Creek sample 1; BC4, Bear Creek 
sample 4. c, VanA a, structure. Left: ribbon diagram of the VanAq, dimer (blue) 
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With few exceptions, the vanHAX operon is invariant in genetic 
organization; it therefore offers a matchless template for confirming its 


presence with PCR assays that span the vanHA and vanAX boundaries. 
Two short qPCR assays were designed to confirm this contiguity 
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Table 1 | vanHAX permutation tests 


Amplicon umber Length (base pairs) Probability of similarity by chance alone to Streptomyces coelicolor genes 

vanH vanA vanX 
H1A1 164 203-213 3.59 x 10-3 439x101’ 0.24 
H1A1* 12 209-216 2.83 x 10°3 8.16 x 10 16 0.28 
H2A3 24 573-605 9.83 x 10°3 1.27 x 10-54 0.22 
H2A4 79 666-681 4.33 x 10°3 6.15 x 10 °° 0.18 
A6X 159 170-179 0.11 6.87 x 108 5.64 x 10° 
A6X* 11 176-179 0.04 2.96 x 10 8 3.63 x 10° 
A2Xx 96 735-796 0.11 1.80 x 10°°9 1.35 x 10° 
HAX+ 40 1,173-1,204 5.95 x 10° 9.32 x 10 % 6.47 x 10°” 


* Clones from independent replication in France. Includes both H1AX and H2AX. 


(Fig. 3a and Supplementary Information). Positive results, including 
particularly high yields of the smallest amplicon, A6X (Supplemen- 
tary Table 9), encouraged us to attempt amplification across both 
boundaries (that is, the complete vanA gene) in a single 1.2-kilobase 
amplicon. We also targeted fragments anchored on either boundary 
and extending as far as possible into vanA. None of the sequences from 
these products, or those generated by an independent laboratory (Sup- 
plementary Information), were present in GenBank. No contaminants 
were detected in more than 300 control reactions. 

Phylogenetic analyses showed that many of the ancient vanHAX 
sequences cluster with characterized glycopeptide-resistant strains of 
Actinobacteria containing vanHAX cassettes (for example streptomy- 
cetes, glycopeptide-producing Amycolatopsis species and the nitrogen- 
fixing Frankia sp. EANIpec) (Fig. 3b and Supplementary Figs 11 and 
12). Another group falls between the actinobacterial sequences and the 
Firmicutes-derived cluster, which includes environmental Paenibacillus 
isolates and the pathogenic Enterococci, and may reflect an intermediate 
group. 

Permutation tests were performed with the PRSS algorithm”® (1,000 
permutations each) to confirm that the sequences were statistically 
similar to those of vancomycin resistance genes (vanHAX) present 
in modern Streptomyces. As shown in Table 1, all vanHA-spanning 
clones have significant similarity to vanH and vanA, and all vanAX- 
spanning clones have significant similarity to vanA and vanx. 

To ascertain whether the complete vanA sequences are indeed func- 
tional and do not represent PCR artefacts or pseudogenes, we synthe- 
sized four open reading frames from the 40 H1AX/H2AX sequences 
(Supplementary Information). Two of these generated soluble proteins 
suitable for purification to homogeneity. Enzymatic characterization 
indicated that these ligases were indeed D-alanine-D-lactate-specific 
(Supplementary Fig. 13), and analysis revealed steady-state kinetic 
parameters consistent with contemporary enzymes derived from both 
the clinic and the environment (Supplementary Table 10). These 
results clearly show that the vanHAX genes identified in the ancient 
samples encode enzymes capable of genuine antibiotic resistance. 

We further confirmed the link between 30,000-year-old VanA and 
contemporary enzymes by determining the three-dimensional struc- 
ture of VanA q2 by X-ray crystallography (Supplementary Table 11 and 
Supplementary Information). The quaternary and tertiary structures 
of VanAa,, crystallized in the ATP-bound form, show the overall 
p-Ala-D-X ligase fold of modern enzymes including VanA from 
vancomycin-resistant Enterococcus faecium (Fig. 3c, d). Superposition 
of ancient and modern VanA (Fig. 3c,d) reveals conservation of 
quaternary and tertiary structure with minor differences in Mg”* 
and ATP y-phosphate coordination. The Q-loop comprises the biggest 
structural change; 13 amino-terminal residues (233-246) are absent 
from the electron density map of VanA 42, including His 241 (His 244 
in modern VanA), responsible for the lactate selectivity. The last seven 
Q-loop residues (247-253) have clear electron density, undergoing a 
drastic 13A shift. These structural differences, however, are not 
reflected in enzyme function. 

This work firmly establishes that antibiotic resistance genes predate 
our use of antibiotics and offers the first direct evidence that antibiotic 
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resistance is an ancient, naturally occurring phenomenon widespread 
in the environment. This is consistent with the rapid emergence of 
resistance in the clinic and predicts that new antibiotics will select for 
pre-existing resistance determinants that have been circulating within 
the microbial pangenome for millennia. This reality must be a guiding 
principle in our stewardship of existing and new antibiotics. 


METHODS SUMMARY 


Permafrost cores were collected at Bear Creek, Yukon, then shipped frozen to the 
McMaster Ancient DNA Centre and stored at —40°C. All subsequent procedures 
before PCR/qPCR amplification were performed in dedicated clean rooms, physically 
separated from laboratories containing modern DNA, bacterial cultures and amp- 
lification products. Contaminant leaching into the centre of cores after sampling was 
monitored by qPCR assays designed to detect E. coli DNA encoding the jellyfish green 
fluorescent protein sprayed onto coring equipment and the external surfaces of all 
collected cores. DNA was extracted from the centre of subsampled permafrost cores. 
PCR assays were designed to target vertebrates, plants, bacteria and specific antibiotic 
resistance elements. All products were sequenced with either the 454 GS-FLX 
platform or by standard cloning and sequencing procedures (GenBank accession 
numbers JN316287-JN366376). The ancient vanA gene identified from the 
permafrost was synthesized and expressed in E. coli, and the His¢-tagged protein 
was purified by immobilized metal-affinity chromatography. This protein was 
used in enzymatic studies to determine steady-state kinetics and was also studied 
by crystallography using the vapour-diffusion hanging-drop method. Data were 
collected at the National Synchrotron Light Source, Brookhaven National 
Laboratory, beamline X25 (PDB 1E4E). 
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The evolution of the amniotic egg was one of the great evolutionary 
innovations in the history of life, freeing vertebrates from an oblig- 
atory connection to water and thus permitting the conquest of 
terrestrial environments’. Among amniotes, genome sequences 
are available for mammals and birds?“*, but not for non-avian 
reptiles. Here we report the genome sequence of the North 
American green anole lizard, Anolis carolinensis. We find that 
A. carolinensis microchromosomes are highly syntenic with chicken 
microchromosomes, yet do not exhibit the high GC and low repeat 
content that are characteristic of avian microchromosomes’. Also, 
A. carolinensis mobile elements are very young and diverse—more 
so than in any other sequenced amniote genome. The GC content 
of this lizard genome is also unusual in its homogeneity, unlike 
the regionally variable GC content found in mammals and birds’. 
We describe and assign sequence to the previously unknown 
A. carolinensis X chromosome. Comparative gene analysis shows 
that amniote egg proteins have evolved significantly more rapidly 
than other proteins. An anole phylogeny resolves basal branches to 
illuminate the history of their repeated adaptive radiations. 

The amniote lineage divided into the ancestral lineages of mammals 
and reptiles ~320 million years ago. Today, the surviving members of 
those lineages are mammals, comprising ~4,500 species, and reptiles, 
containing ~17,000 species. Within the reptiles, the two major clades 
diverged ~280 million years ago: the lepidosaurs, which contains 
lizards (including snakes) and the tuatara; and the archosaurs, contain- 
ing crocodilians and birds (the position of turtles remains unclear)°. 
For simplicity, we will refer here to lepidosaurs as lizards (Fig. 1). 

The study of the major genomic events that accompanied the trans- 
ition to a fully terrestrial life cycle has been assisted by the sequencing 
of several mammal (K.L.-T. et al, manuscript submitted) and three 
bird genomes**. The genome of the lizard A. carolinensis thus fills an 
important gap in the coverage of amniotes, splitting the long branch 
between mammals and birds and allowing more robust evolutionary 
analysis of amniote genomes. 


For instance, almost all reptilian genomes contain microchromo- 
somes, but these have only been studied at a sequence level in birds”’, 
raising the question as to whether the avian microchromosomes’ 
peculiar sequence features are universal across reptilian microchromo- 
somes*. Another example is the study of sex chromosome evolution. 
Nearly all placental and marsupial mammals share homologous sex 
chromosomes (XY)* and all birds share ZW sex chromosomes. 
However, lizards exhibit either genetic or temperature-dependent 
sex determination’®. Characterization of lizard sex chromosomes 
would allow the study of previously unknown sex chromosomes and 
comparison of independent sex chromosome systems in closely related 
species. 

Anolis lizards comprise a diverse clade of ~400 described species 
distributed throughout the Neotropics. These lizards have radiated, 
often convergently, into a variety of ecological niches with attendant 
morphological adaptations, providing one of the best examples of 
adaptive radiation. In particular, their diversification into multiple rep- 
licate niches on diverse Caribbean islands via interspecific competition 
and natural selection has been documented in detail’’. A. carolinensis is 
the only anole native to the USA and can be found from Florida and 
Texas up to North Carolina. We chose this species for genome sequen- 
cing because it is widely used as a reptile model for experimental ecology, 
behaviour, physiology, endocrinology, epizootics and, increasingly, 
genomics. 

The green anole genome was sequenced and assembled (AnoCar 2.0) 
using DNA from a female A. carolinensis lizard (Supplementary Tables 
1-4). Fluorescence in situ hybridization (FISH) of 405 bacterial arti- 
ficial chromosome (BAC) clones (from a male) allowed the assembly 
scaffolds to be anchored to chromosomes (Supplementary Table 5 and 
Supplementary Fig. 1). The A. carolinensis genome has been reported 
to have a karyotype of n = 18 chromosomes, comprising six pairs of 
large macrochromosomes and 12 pairs of small microchromosomes”’. 
The draft genome sequence is 1.78 Gb in size (see Supplementary 
Table 3 for assembly statistics) and represents an intermediate 
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Figure 1 | Amniote phylogeny based on protein synonymous sites showing 
major features of amniote evolution. Major characteristics of lizard evolution 
including homogenization of GC content, high sex chromosome turnover and 


between genome assemblies of birds (0.9-1.3Gb) and mammals 
(2.0-3.6 Gb). 

We find that few chromosomal rearrangements occurred in the 280 
million years since anole and chicken diverged, as had been hinted at 
by previous comparisons using Xenopus and chicken’’. There are 259 
syntenic blocks (defined as consecutive syntenic anchors that are con- 
sistent in order, orientation and spacing, at a resolution of 1 Mb) 
between lizard and chicken (Supplementary Table 6 and Supplemen- 
tary Fig. 2). Interestingly, 19 out of 22 anchored chicken chromosomes 
are each syntenic to a single A. carolinensis chromosome over their 
entire lengths (Fig. 2a); by contrast, only 6 (of 23) human chromo- 
somes are syntenic to a single opossum chromosome over their entire 
lengths, even though the species diverged only 148 million years ago™*. 
Segmental duplications follow trends seen in other amniote genomes 
(Supplementary Note, Supplementary Table 7 and Supplementary 
Fig. 3). 

Approximately 30% of the A. carolinensis genome is composed of 
mobile elements, which comprise a much wider variety of active repeat 
families than is seen for either bird’ or mammalian’ genomes. The 
most active classes are long interspersed (LINE) elements (27%) and 
short interspersed (SINE) elements (16%)'° (Supplementary Table 8). 
The majority of LINE repeats belong to five groups (L1, L2, CR1, RTE 
and R4) and seem to be recent insertions based on their sequence 
similarity (divergence ranges from 0.00—-0.76%; ref. 17). This contrasts 
with observations of mammalian genomes, where only a single family 
of LINEs—L1—has predominated over tens of millions of years. The 
DNA transposons comprise at least 68 families belonging to five super- 
families: hAT, Chapaev, Maverick, Tc/Mariner and Helitron’’. As with 
retrotransposons, the majority of DNA transposon families seem to be 
relatively young in contrast to the extremely few recently active DNA 
transposons found in other amniote genomes (Supplementary Table 9). 
Overall, A. carolinensis mobile elements feature significantly higher GC 
content (43.5%, P< 10° *°) than the genome-wide average of 40.3%. In 
addition to mobile elements, A. carolinensis exhibits a high density 
(3.5%) of tandem repeats, with length and frequency distributions 
similar to those of human microsatellite DNA’*. We now know that 
amniote genomes come in at least three types: mammalian genomes are 
enriched for L1 elements and have a high degree of mobile element 
accumulation, bird genomes are repeat poor with very little mobile 
element activity, while the lizard genome contains an extremely wide 
diversity of active mobile element families but has a low rate of accu- 
mulation, which is reminescent of the mobile element profile of 
teleostean fishes”. 
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high levels of repeat insertion are featured. Sex chromosome inventions are 
indicated in red. Branch length is proportional to dS (the synonymous 
substitution rate); dS of each branch is indicated above the line. 


Most reptile genomes contain microchromosomes, but the numbers 
vary among species; the A. carolinensis genome contains 12 pairs of 
microchromosomes”, whereas the chicken genome contains 28 pairs. 
Bird microchromosomes have very distinctive properties compared to 
bird macrochromosomes, such as higher GC and lower repeat contents’, 
whereas lizard microchromosomes do not exhibit these features 
(Fig. 2b). Remarkably, all sequence anchored to microchromosomes 
in A. carolinensis also aligns to microchromosomes in the chicken 
genome, and all but one A. carolinensis microchromosome is syntenic 
to only a single corresponding chicken microchromosome (Fig. 2a). 
Microchromosomes conserved between A. carolinensis and chicken thus 
could have arisen in the reptile ancestor, whereas the remaining chicken 
microchromosomes could be derived in the bird lineage. Alternatively, 
the remaining chicken microchromosomes could have been present in 
the reptile ancestor but fused to form macrochromosomes in the lizard 
lineage. 

The A. carolinensis genome has surprisingly little regional variation 
of GC content, substantially less than previously observed for birds and 
mammals; it is the only amniotic genome known whose nucleotide 
composition is as homogenous as the frog genome® (Supplementary 
Figs 4 and 5). Figure 3 illustrates how local GC content is evolutionarily 
conserved between human chromosome 14 and chicken chromosome 
5, but to a much lesser degree with A. carolinensis chromosome 1. As 
all sequenced amniote genomes other than A. carolinensis contain 
these homologous varying levels of GC content (“‘isochores’)”®, the 
ancestral amniote GC heterogeneity is likely to have eroded towards 
homogeneity in this lizard’s lineage. It has been proposed that iso- 
chores with high GC content are a consequence of higher rates of 
GC-biased gene conversion in regions of higher recombination’. The 
greater GC homogeneity in the anole genome may thus reflect more 
uniform recombination rates, or else a substantially reduced bias 
towards GC during the resolution of gene conversion events in the 
A. carolinensis lineage (for a discussion, see ref. 5). 

Both temperature-dependent sex determination and XY genetic sex 
determination have been found in Iguania’®. Within the genus Anolis, 
there are species with heteromorphic XY chromosomes (including 
those with multiple X and Y chromosomes), and others with entirely 
homomorphic chromosomes’*. A. carolinensis is known to have 
genetic sex determination”', but the form of its sex chromosomes 
(ZW or XY) has thus far been unknown owing to a lack of obviously 
heteromorphic chromosomes. 

In depth examination of male and female cells using FISH allowed 
us to identify the microchromosome previously designated as ‘b’ as the 
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Figure 2 | A. carolinensis-chicken synteny map reveals synteny of reptile 
microchromosomes but dissimilar GC and repeat content. a, Very few 
rearrangements have occurred in the 280 million years since A. carolinensis and 
chicken diverged. A. carolinensis microchromosomes are exclusively syntenic 
to chicken microchromosomes. Horizontal coloured bars depict the six 

A. carolinensis macrochromosomes (1-6) and the six (of 12) A. carolinensis 
microchromosomes that have sequence anchored to them that is syntenic to the 
chicken genome (7, 8, 9, X, LGg, LGh). Chromosomes that could be ordered by 
size were assigned a number; the smaller microchromosomes that could not be 
distinguished by size were assigned a lowercase letter. Each colour corresponds 
to a different chicken chromosome as indicated in the key. Any part of an 

A. carolinensis chromosome that is syntenic to a chicken microchromosome is 
indicated by ‘m’. b, Chicken microchromosomes have both higher GC content 
and lower repeat content than chicken macrochromosomes, whereas 

A. carolinensis chromosomes do not vary in GC or repeat content by 
chromosome size. Large circles designate the GC percentage of each 
chromosome in the chicken and lizard genomes with greater than 100 kb of 
sequence anchored to it. Small circles designate the percentage of the genome 
made up of repetitive sequence of each chromosome in the chicken (blue 
circles) and lizard (red circles) genomes. 


A. carolinensis X chromosome; it is present in two copies in females 
and one in males. This chromosome is syntenic to chicken microchro- 
mosome 15. Eleven BACs assigned to two scaffolds, 154 (3.3 Mb) and 
chrUn0090 (1.8 Mb), hybridize via FISH to the p arms of the two X 
chromosomes in females, and hybridize to the p arm of the single X 
chromosome in males (Fig. 4 and Supplementary Fig. 1). A. carolinensis 
thereby shows a pattern representative of a male heterogametic system 
of genotypic sex determination. We have not identified the Y chro- 
mosome, but we hypothesize that A. carolinensis possesses both X 
and Y chromosomes, as both male and female cells contain the same 
number of chromosomes. 

The 5.1 Mb of sequence assigned to the X chromosome contains 62 
protein-coding genes (Supplementary Table 10); Gene Ontology (GO) 
terms associated with these genes show no significant enrichment. It is 
very likely that there is more X chromosome sequence that is currently 
labelled as unanchored scaffolds in the AnoCar 2.0 assembly. Identi- 
fication of the A. carolinensis sex determination gene will require 
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Figure 3 | The A. carolinensis genome lacks isochores. The A. carolinensis 


genome shows only very local variation in GC content, unlike the human and 
chicken genomes, which also show larger trends in GC variation, sometimes 
called isochores. Syntenic regions of human chromosome 14, chicken 
chromosome 5 and A. carolinensis chromosome 1 are shown. The human and 
chicken regions are inverted and rearranged to align with the A. carolinensis 
region. Blue lines depict GC percentage in 20-kb windows. The purple line 
designates the genome average. Green lines represent examples of syntenic 
anchors between the three genomes. 


considerable functional biology, but we note that the chicken sex 
determination gene DMRT1 is located on A. carolinensis chromosome 
2 and that SOX3 (the X chromosome paralogue of the therian mammal 
sex determination gene SRY) is located on an unanchored A. caroli- 
nensis scaffold; these genes are thus unlikely to be the A. carolinensis 
sex determination gene. 

All ten A. carolinensis individuals (originating from South Carolina 
and Tennessee) used for FISH mapping showed large pericentromeric 
inversions in one or more of chromosomes 1-4, with no correlation 
between different chromosomal inversions or with the sex of the lizard 
(see Supplementary Note, Supplementary Table 11 and Supplemen- 
tary Fig. 6). 

A total of 17,472 protein-coding genes and 2,924 RNA genes were 
predicted from the A. carolinensis genome assembly (Ensembl release 
56, September 2009). We built a phylogeny for all A. carolinensis genes 
and their homologues in eight other vertebrate species (human, 


a b 

Figure 4| The A. carolinensis genome contains a newly discovered X 
chromosome. a, b, The X chromosome, a microchromosome, is found in one 
copy in male A. carolinensis (a) and in two copies in females (b). The BAC 
206M13 (CHORI-318 BAC library) is hybridized to the p arm of the X 
chromosome using FISH in both male and female metaphase spreads. 206M13 


and ten other BACs showed this sex-specific pattern in cells derived from five 
male and five female individuals. Original magnification, < 1,000. 
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mouse, dog, opossum, platypus, chicken, zebra finch and pufferfish), 
allowing us to identify a conservative set of 3,994 one-to-one ortholo- 
gues, that is, genes that have not been duplicated or deleted in any of 
these vertebrates since their last common ancestor. These gene phylo- 
genies were also used to identify genes that arose by duplication in the 
lizard lineage after the split with the avian lineage and, separately, those 
that were lost in the mammalian lineage after the mammal-reptile split 
(Fig. 1, Supplementary Note, Supplementary Fig. 7 and Supplementary 
Table 12). 

We found 11 A. carolinensis opsin genes that have no mammalian 
orthologues (but have orthologues in invertebrates, fishes and frog), 
and thus seem to have been lost during mammalian evolution (Sup- 
plementary Table 13). The large repertoire of opsins may contribute to 
the excellent colour vision of anoles—including the ability to see in the 
ultraviolet range—and also may contribute to their hyperdiversity by 
allowing the evolution of diverse, species-specific colouration of the 
dewlap, which has an important role in sexual selection and species 
recognition". Similarly, olfactory receptor and f-keratin genes are 
highly duplicated in A. carolinensis (Supplementary Note and Sup- 
plementary Fig. 9). 

Many reptiles, including green anoles, differ from placental mammals 
in being oviparous (laying eggs). Vivipary in placental mammals is a 
derived state, reflected in their loss of some egg-related genes. We used 
mass spectrometry to identify proteins present in the immature 
A. carolinensis egg, as most egg proteins are produced in the mother’s 
body and then transported into the immature egg. We found that in 
contrast with mammals, reptiles have lineage-specific gene duplica- 
tions, including in vitellogenins (VTGs), apovitellenin-1, ovomucin-« 
and three homologues of ovocalyxin-36, a chicken eggshell matrix 
protein. 

Our results show rapid evolution of egg protein genes among 
amniotes. Specifically, we found proteins from 276 A. carolinensis 
genes in immature A. carolinensis eggs (Supplementary Tables 14 
and 15), of which only 50 have been confirmed to be present in chicken 
eggs by mass spectrometry**”’. These genes include VTGs, a lysozyme, 
vitelline membrane outer layer protein 1 (VMO1) paralogues, protease 
inhibitors, natterin and nothepsin. By aligning genes that are one-to- 
one orthologues in A. carolinensis and chicken, we found that egg 
proteins evolve significantly more rapidly than non-egg proteins 
(mean dN/dS values (ratio of the rate of non-synonymous substitu- 
tions to the rate of synonymous substitutions) of 0.186 and 0.135, 
respectively; P= 1.2 X 10°), which reflects reduced purifying selec- 
tion and/or more frequent episodes of adaptive evolution. 

Using multiple vertebrate genome sequences, we identified three 
VMOI1 paralogues (which we name «, B and y) that we infer to have 
been present in the last common ancestor of all reptiles and mammals. 
Whereas at least one of VMO1-0, VMO1-B and VMO1- has been lost 
in all other amniote genomes, the A. carolinensis genome contains 
representatives of all three paralogues. Moreover, the A. carolinensis- 
specific VMO1-o family has grown to 13 members and has experi- 
enced positive selection of amino acid substitutions within a negatively 
charged, probably substrate-binding cavity; changes that, presumably, 
modify its lysozyme-like transferase activity (Supplementary Note, 
Supplementary Fig. 8 and Supplementary Tables 16 and 17). 

The extensive and active repeat repertoire of A. carolinensis has 
allowed us to discover the origin of several mammalian conserved ele- 
ments. Through the process of exaptation (a major change in function of 
a sequence during evolution), certain mobile elements that were active 
in the amniote ancestor have become conserved, and presumably func- 
tional, in mammals, while remaining active mobile elements in 
A. carolinensis. The origin of these conserved mammalian sequences 
in mobile elements was not recognizable without comparison to a dis- 
tant and repeat-rich genome sequence™. We identified 96 such exapted 
elements (see Supplementary Table 18) in the human genome tracing 
back to mobile elements present in the amniote ancestor that are still 
present in A. carolinensis, particularly the CR1, L2 and gypsy families. 
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Although most exapted elements are non-coding and probably 
serve a regulatory function, we also identified a protein-coding exon 
that was exapted from an L2-like LINE, now constituting exon 2 ina 
mammal-specific N-terminal region of the MIERI (mesoderm induc- 
tion early response 1) protein. This exon is highly conserved across 29 
mammals and therefore probably represents a mammalian innovation 
since the amniote ancestor. 

GO terms associated with the transcription start site closest to each 
exapted element in the human genome show enrichment for neuro- 
developmental genes (see Methods), with “ephrin receptor binding”, 
“nervous system development” and “synaptic transmission” being 
strongly enriched (all P values<5 X10 *). These enrichments are 
consistent with adaptive changes in neurodevelopment occurring 
during the emergence of mammals. 

Anolis lizards are a textbook case of adaptive radiation, having 
diversified independently on each island in the Greater Antilles and 
throughout the Neotropics, producing a wide variety of ecologically 
and morphologically differentiated species, with as many as 15 found 
ata single locality'’. Although anoles are widely used as a model system 
for phylogenetic comparative studies, it has been difficult to determine 
the evolutionary relationships among major anole clades owing to 
rapid evolutionary radiations associated with access to new dimen- 
sions of ecological opportunity. Successfully resolving the relatively 
short branching events associated with such a radiation requires a 
wealth of data from loci evolving at an appropriate rate. 

We used the genome sequence of A. carolinensis to develop a new 
phylogenomic data set comprised of 20 kb of sequence data sampled 
from across the genomes of 93 species of anoles (Supplementary 
Tables 19 and 20). Analyses of this data set infer a well-supported 
phylogeny that reinforces and clarifies the adaptive and biogeographic 
history of anoles (Fig. 5, details in Supplementary Fig. 10). First, our 
phylogenomic analysis reaffirms previous molecular and morpho- 
logical studies indicating that similar anole habitat specialists have 
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Figure 5 | A phylogeny of 93 Anolis species clarifies the biogeographic 
history of anoles. Anolis ecomorphs derive from convergent evolution and not 
from frequent inter-island migration. Using conserved primer pairs distributed 
across the genome of A. carolinensis, we obtain sequences from 46 genomically 
diverse loci evolving at a range of evolutionary rates and representing both 
protein-coding and non-coding regions. Maximum likelihood analyses of this 
new data set of 20 kb aligned nucleotides infer nearly all previously established 
anole relationships while also partially resolving the basal relationships that 
have plagued previous studies. Open circles indicate bootstrap (bs) values <70; 
grey-shaded circles, 70< bs <95; filled circles, bs >95. 
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evolved independently on each of the four large Greater Antillean 
islands. Second, our analyses suggest a complex biogeographic scenario 
involving a limited number of dispersal events between islands and 
extensive in situ diversification within islands. The closest relatives of 
Anolis occur on the mainland and the phylogeny confirms the existence 
of two colonizations, one into the southern Lesser Antilles and the 
second producing the diverse adaptive radiations throughout the rest 
of the Caribbean. Within this latter clade, anoles initially diversified 
primarily on the two larger Greater Antillean islands (although Puerto 
Rico also seems to have been involved) before subsequently undergoing 
secondary radiations on all of the islands and eventually returning to the 
mainland, where this back-colonization has produced an extensive 
evolutionary radiation. The phylogeny also indicates that very few 
inter-island dispersal events occurred in Greater Antillean evolution. 
Rather, the Greater Antillean faunas, renowned for the extent to which 
the same ecomorphs are found on each island, are primarily the result of 
convergent evolution”>. 

The genome sequence of A. carolinensis allows a deeper understand- 
ing of amniote evolution. Filling this important reptilian node with a 
sequenced genome has revealed derived states in each major amniote 
branch and has helped to illuminate the amniote ancestor. However, 
the tree of sequenced reptilian genomes is still extremely sparse, and 
the sequencing of additional non-avian reptiles would be necessary to 
fully understand how typical A. carolinensis and the sequenced bird 
genomes are of the entire reptile clade. 

In addition to the utility of the A. carolinensis genome sequence as a 
representative of non-avian reptiles, Anolis species are a unique 
resource for the study of adaptive radiation and convergent evolution. 
With their invasions of and subsequent radiations on Caribbean 
islands, anoles provide a terrestrial analogue to stickleback and cichlid 
fish, which underwent adaptive evolution in separate aquatic environ- 
ments. Just as genomic research in sticklebacks has deepened the study 
of aquatic ecological speciation, a large-scale genomic phylogenetic 
survey of the Caribbean anoles would be an opportunity for detailed 
study of adaptive evolution in a land animal”; in particular because 
anole genomes contain large numbers of active mobile elements that 
we speculate could form substrates for exaptation of novel regulatory 
elements. 


METHODS SUMMARY 


A full description of methods, including sample collection, sequencing, assembly, 
anchoring, mass spectrometry and all sequence analysis, can be found in Sup- 
plementary Information. All animal experiments were approved by the MIT 
Committee for Animal Care. 
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Innate immune recognition of bacterial ligands by 
NAIPs determines inflammasome specificity 


Eric M. Kofoed! & Russell E. Vance! 


Inflammasomes are a family of cytosolic multiprotein complexes 
that initiate innate immune responses to pathogenic microbes by 
activating the caspase 1 protease’”. Although genetic data support 
a critical role for inflammasomes in immune defence and inflam- 
matory diseases’, the molecular basis by which individual inflam- 
masomes respond to specific stimuli remains poorly understood. 
The inflammasome that contains the NLRC4 (NLR family, CARD 
domain containing 4) protein was previously shown to be activated 
in response to two distinct bacterial proteins, flagellin*’ and PrgJ°, 
a conserved component of pathogen-associated type III secretion 
systems. However, direct binding between NLRC4 and flagellin or 
PrgJ has never been demonstrated. A homologue of NLRC4, 
NAIP5 (NLR family, apoptosis inhibitory protein 5), has been 
implicated in activation of NLRC4 (refs 7-11), but is widely 
assumed to have only an auxiliary role’’, as NAIP5 is often dis- 
pensable for NLRC4 activation”*. However, Naip5 is a member of a 
small multigene family’, raising the possibility of redundancy and 
functional specialization among Naip genes. Here we show in mice 
that different NAIP paralogues determine the specificity of the 
NLRC4 inflammasome for distinct bacterial ligands. In particular, 
we found that activation of endogenous NLRC4 by bacterial PrgJ 
requires NAIP2, a previously uncharacterized member of the 
NAIP gene family, whereas NAIP5 and NAIP6 activate NLRC4 
specifically in response to bacterial flagellin. We dissected the 
biochemical mechanism underlying the requirement for NAIP 
proteins by use of a reconstituted NLRC4 inflammasome system. 
We found that NAIP proteins control ligand-dependent oligomer- 
ization of NLRC4 and that the NAIP2-NLRC4 complex physically 
associates with PrgJ but not flagellin, whereas NAIP5-NLRC4 
associates with flagellin but not PrgJ. Our results identify NAIPs 
as immune sensor proteins and provide biochemical evidence for a 
simple receptor-ligand model for activation of the NAIP-NLRC4 
inflammasomes. 

A fundamental question in immunology is how host defence is 
initiated in response to specific microbial ligands. The inflammasome 
containing the NLRC4 protein activates caspase 1 (CASP1) in response 
to the carboxy terminus of bacterial flagellin®’, as well as in response to 
the inner rod protein of the type III secretion systems of diverse 
bacterial species (for example, PrgJ of Salmonella enterica serovar 
Typhimurium)*. Activated CASP1 processes interleukin (IL)-1B and 
IL-18 inflammatory cytokines and induces a rapid and inflammatory 
host cell death called pyroptosis'’. In certain cases, NLRC4 activation 
requires NAIP5, as Naip5 ~'~ mice fail to activate NLRC4 or CASP1 in 
response to infection with Legionella pneumophila or in response to the 
C terminus of flagellin”®. Interestingly, however, NAIP5 is not essential 
for NLRC4 activation in response to S. enterica Typhimurium or 
PrgJ’*. 

In addition to Naip5, C57BL/6 mice express three other Naip genes 
(Naip1, Naip2 and Naip6), the functions of which remain unknown”. 
We hypothesized that each NAIP paralogue may have evolved to be 
specific for a unique bacterial ligand. We first focused on NAIP2, as it 
appeared to be highly expressed in C57BL/6 mice™. We used specific 


short hairpin RNAs (shRNAs) to knock down Naip2 expression in 
primary bone-marrow-derived macrophages. ShRNA1 and shRNA2 
specifically reduced NAIP2 protein levels without targeting other 
NAIP paralogues, whereas empty vector, shRNA3 or a scrambled 
control shRNA had little effect on NAIP2 protein levels (Supplemen- 
tary Fig. la, b). Macrophages expressing these shRNAs were then 
infected with flagellin-deficient Listeria strains that inducibly express 
PrgJ (Listeria-PrgJ) or flagellin (Listeria-FlaA)°. A Listeria-based system 
was chosen because it is an efficient means for delivering PrgJ to macro- 
phages’, and because it allows for controlled comparisons of PrgJ and 
FlaA within a single experimental system. Notably, knockdown of Naip2 
prevented pyroptosis and CASP1 activation by Listeria-PrgJ (Fig. la—c). 
By contrast, Naip2 knockdown did not affect inflammasome activation 
by Listeria-FlaA (Fig. 1b, c) or L. pneumophila, which expresses flagellin 
but not PrgJ (Supplementary Fig. 1c). Instead, flagellin-dependent 
inflammasome activation depended on Naip5, as previously shown’™"". 
Inflammasome activation by wild-type Salmonella, which encodes both 
flagellin and PrgJ, was not significantly affected by Naip2 knockdown 
(Fig. 1d, e). However, knockdown of Naip2 in NaipS ‘~ macrophages 
significantly reduced or abolished inflammasome activation by wild- 
type Salmonella (Fig. 1d, e), indicating that both NAIP2 and NAIP5 
recognize Salmonella. Interestingly, inflammasome activation by 
flagellin-deficient (FliC” FIjB) Salmonella, which still express PrgJ, 
depended entirely on Naip2 (Fig. 1d, e). Taken together, these data 
indicate that Naip2 is specifically required for activation of the NLRC4 
inflammasome by PrgJ, in contrast to Naip5, which seems to be spe- 
cifically required for NLRC4 activation by flagellin. 

Biochemical analysis of the inflammasome in macrophages is com- 
plicated by the expression of multiple NAIP proteins and by their low 
expression levels. We therefore decided to reconstitute the NLRC4 
inflammasome in non-immune 293T cells, which do not express 
NLRC4 or NAIPs, so that the functions of individual NAIP proteins 
could be analysed. 293T cells transiently transfected with green fluor- 
escent protein (GFP)-marked vectors encoding wild-type NLRC4, 
NAIP5 and CASP1 did not exhibit significant spontaneous inflamma- 
some activation, and instead, most cells expressed GFP (Fig. 2a). 
However, when flagellin (FlaA) from L. pneumophila was co-expressed 
with NLRC4, NAIP5 and CASP1, we observed a significant loss of 
GFP" cells and an increase in the number of dead (7AAD*) cells 
(Fig. 2a). This result was highly reminiscent of flagellin-dependent 
activation of the endogenous NAIP5-NLRC4 inflammasome in macro- 
phages, which also results in a rapid CASP1-dependent cell death, loss 
of membrane integrity, and release of cytosolic contents and GFP’. 
Similar to the genetic requirement for Nirc4, Naip5 and Casp1 in macro- 
phages**’"*"*, we found that NAIP5, NLRC4, catalytically active 
CASP1, and FlaA are all required to trigger cell death and loss of mem- 
brane integrity/GFP in reconstituted 293T cells (Fig. 2b, c). The recon- 
stituted NAIP5-NLRC4 inflammasome also recapitulated the ability of 
native inflammasomes to process CASP1 and IL-1f in response to 
cytosolic flagellin (Supplementary Fig. 2). Consistent with a lack of a 
role for NAIP5 in recognition of PrgJ by macrophages’, the reconstituted 
NAIP5-NLRC4 inflammasome did not respond to PrgJ (Fig. 2d, e). By 
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Figure 1 | NAIP2 is required in macrophages for inflammasome activation 
in response to PrgJ. a—c, Primary bone-marrow-derived macrophages 
expressing shRNAs targeting Naip2 (N2) (or controls) were infected with 
flagellin-deficient L. monocytogenes (multiplicity of infection = 5) expressing a 
secreted ActA100-PrgJ (pPrgJ) or ActA100-FlaA (pFlaA) fusion protein under 
IPTG-inducible control. Cell death (+s.d.) was measured in triplicate by LDH 
release 6h after infection (a, b), or active CASP1 (p10) was measured by 
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Figure 2 | Reconstitution of the NAIP5-NLRC4 inflammasome in 293T 
cells. a, GFP-marked expression vectors encoding NLRC4, NAIP5, CASP1 
and/or flagellin (FlaA) were transiently transfected into 293T cells. Cells were 
imaged for differential interference contrast (DIC) and GFP fluorescence 48 h 
later. Dead cells were stained with 7AAD. b, c, GEP™®" cells (b) and 7AAD~ 
cells (c) were quantified (+s.d.) as in a, but with specific expression vectors 
omitted from the transfection as indicated. CASP1(C284A) is a catalytically 
dead mutant. d, e, 293T cells were transfected as indicated and analysed as 
above. Data shown (+s.d.) are representative of at least three independent 
experiments. *P < 0.02 (Student’s t-test, two tailed); NS, not significant. 
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western blotting of cell supernatants (c). d, e, NAIP2 knockdown cells were 
infected with wild-type or flagellin-deficient (FliC” FIjB~) S. enterica 
Typhimurium and inflammasome activation was measured by LDH release 
(+s.d.) at 3h after infection (d) or CASP1 processing (e). Data shown are 
representative of two (c, e) or three (a, b, d) independent experiments. 

*P < 0.02 as compared to scramble (Student's t-test, two-tailed). 


contrast, a reconstituted NAIP2-NLRC4 inflammasome responded 
specifically to PrgJ but not flagellin (Fig. 2d, e). Taken together, we 
conclude that we have successfully reconstituted NAIP2~NLRC4 and 
NAIP5-NLRC4 inflammasomes that exhibit all the known require- 
ments and specificities of the native inflammasomes. 

It is believed that activated inflammasomes assemble into high- 
molecular-mass multiprotein complexes’®, but this has not been 
demonstrated for the NLRC4 inflammasome. To visualize inflamma- 
some assembly, 293T cells were transfected with NAIP5, NLRC4 and 
FlaA in various combinations, but CASP1 was omitted so that cell 
death and loss of cellular contents (and assembled inflammasomes) 
would not occur. Digitonin-solubilized cell lysates were resolved on 
blue native (BN) polyacrylamide gel electrophoresis (PAGE) gels’”. A 
marked shift of NLRC4 from a monomer (~ 120 kDa) to an oligomeric 
complex (~ 1,000 kDa) was seen in the presence of NAIP5 and FlaA. 
NAIP5 was also contained within the high-molecular-mass oligomeric 
complex (Fig. 3a). The association of NAIP5 and NLRC4 in the same 
complex was validated by co-immunoprecipitation (Supplementary 
Fig. 3)'"'®. NLRC4 oligomerization was induced by either untagged 
FlaA or a GFP-FlaA fusion protein (Fig. 3a), both of which activate 
NLRC4-CASP1. Importantly, assembly of the NLRC4 inflammasome 
required FlaA (Supplementary Fig. 4a) and was not observed in the 
absence of NAIP5 (Fig. 3a), indicating that a biochemical function of 
NAIP5 is to promote NLRC4 oligomerization. 

Despite strong genetic evidence that NLR proteins, such as NAIP5 
and NLRC4, function as microbial ‘sensors’, there is no biochemical 
evidence that NLRs interact directly with microbial ligands. In fact, 
some studies of the NLRP3 inflammasome’’™', as well as analyses of 
analogous proteins from plants”, suggest that at least some NLRs 
recognize pathogens indirectly. To determine if the oligomerized 
NAIP5-NLRC4 complex also contains flagellin, we subjected samples 
separated in the first dimension by native PAGE to a second dimension 
of SDS-PAGE. To facilitate detection of flagellin, we used a 6X Myc- 
tagged flagellin, which activates the inflammasome identically to native 
flagellin (data not shown). This approach revealed that FlaA was indeed 
present in a high-molecular-mass complex, along with NAIP5 and 
NLRC4 (Fig. 3b). NAIP5 exhibited a weak flagellin-dependent mobility 
shift in the absence of NLRC4 (Supplementary Fig. 4b), indicating that 
NLRC4 is not essential for flagellin recognition, although formation/ 
stabilization of the oligomerized complex seems to be significantly 
enhanced by NLRC4. FlaA expressed alone was present in cell extracts 
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Figure 3 | NAIP5 is required for formation of a hetero-oligomeric complex 
that contains NLRC4, NAIP5 and flagellin. a, 293T cells were transfected as 
indicated, followed by analysis by blue native PAGE or SDS-PAGE, and 
western blotting. *NS, nonspecific band. b, 293T cells were transfected as 
indicated and lysates were separated by a first dimension of blue native PAGE 
followed by a second dimension of SDS-PAGE. ¢, d, 293T cells were transfected 
as indicated and samples were processed and analysed as in a. Data shown are 
representative of at least three independent experiments. 


only as a monomer (Supplementary Fig. 4c). Taken together, these 
observations provide evidence for a simple receptor-ligand model of 
NAIP5-NLRC4 activation by flagellin. 

Consistent with the autoinhibitory function of the leucine-rich 
repeats (LRRs) in other NLRs, we found that NAIP5(ALRR) and 
NLRC4(ALRR) constitutively activated CASP1-dependent cell death, 
independent of the presence of flagellin (Supplementary Fig. 5). 
Interestingly, NLRC4(ALRR) was able to activate CASP1 in the 
absence of NAIP5, whereas constitutively active NAIP5(ALRR) 
required wild-type NLRC4 to activate CASP1. This result suggests that 
NAIP5 functions upstream of NLRC4. Indeed, NAIP5(ALRR) was able 
to induce the oligomerization of wild-type NLRC4 (Fig. 3c), whereas 
the spontaneous oligomerization of NLRC4(ALRR) did not require 
NAIP5 (Fig. 3d). Spontaneous oligomerization of NLRC4(ALRR) 
did require the nucleotide-binding domain (NBD) of NLRC4, as a 
K175R mutation previously shown to disrupt NBD function” 
abolished NLRC4(ALRR) auto-oligomerization (Fig. 3d). The ability 
of NAIP5 to induce oligomerization of NLRC4 in response to flagellin 
required both the NBD and amino-terminal BIRs of NAIP5, but did 
not require the N-terminal CARD of NLRC4 (Supplementary Fig. 4d, 
e), whereas functional CASP1 activation required all of these domains 
(Supplementary Figs 5 and 6). Taken together, these data are indi- 
cative of a working model (Supplementary Fig. 7) in which NAIP5 is 
activated by flagellin and induces downstream NLRC4 oligomeriza- 
tion and CASP1 activation. 

Consistent with a specific role for NAIP2 in recognition of PrgJ, we 
found that PrgJ did not induce the oligomerization of NAIP5-NLRC4 
(Fig. 4a) but did induce oligomerization of NAIP2-NLRC4. Oligo- 
merization of NLRC4 did not occur when co-expressed with NAIP2 
alone or with NAIP2 and FlaA (Fig. 4b). Interestingly, NAIP6 resembled 
NAIP5 and supported NLRC4 oligomerization in response to FlaA but 
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Figure 4 | NAIP paralogues confer specificity to the NLRC4 inflammasome. 
a, 293T cells were co-transfected with wild-type NAIP5 and NLRC4, alone or in 
combination with 6-Myc-FlaA or 6X-Myc-PrgJ followed by blue native 
PAGE 48 h later. *NS, nonspecific band. Whole-cell lysates were also separated 
by conventional 4-12% SDS-PAGE to control for expression of each 
transfected gene construct (left panel). b, 293T cells were transfected with wild- 
type NAIP2 and NLRC4 and analysed as in a. c, 293T cells were transfected with 
wild-type NAIP6 and NLRC4, and analysed as in a. Data shown are 
representative of at least three independent experiments. 


not PrgJ (Fig. 4c), perhaps providing an explanation for the previously 
puzzling observation that Naip5 ‘~ cells can respond to high levels of 
flagellin’. In contrast, NAIP1 is an ‘orphan’ NAIP because it responded 
neither to PrgJ nor flagellin (Fig. 2d, e and Supplementary Fig. 8). 

Our results demonstrate that the ability of the NLRC4 inflamma- 
some to assemble and functionally activate CASP1 in response to spe- 
cific bacterial ligands is dictated by NAIP family members. The most 
parsimonious model to account for our results is that NAIP proteins 
function as direct receptors for bacterial ligands (Supplementary Fig. 7). 
Although NLRC4 was previously suspected to be the cytosolic flagellin 
sensor’, we hypothesize that a main function of NLRC4 may instead 
be to serve as an adaptor, downstream of NAIP proteins, to recruit 
CASP1 via a CARD-CARD interaction. NLRC4 may also have an 
important function in ligand binding or in stabilizing NAIP- 
NLRC4-ligand complexes, but the specificity of the complexes for 
particular ligands seems to be controlled by NAIP proteins. 

The number and sequence of Naip paralogues vary significantly 
among inbred mouse strains, and have been suggested to be evolving 
rapidly™*. Indeed, the murine Naip locus was originally identified by a 
forward genetic approach which took advantage of the widely varying 
susceptibility of inbred mouse strains to L. pneumophila infection'*”. 
The single known human NAIP orthologue may also exist within a 
rapidly evolving locus”; our results indicate that it will be of great 
interest to establish the specificity of the human NAIP protein. We 
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propose that Naip gene evolution represents a fascinating example of 
the molecular arms race between bacteria and their hosts. 


METHODS SUMMARY 

Naip2 knockdown. Primary C57BL/6 bone marrow cells were transduced with 
pLKO.1-based lentivirus encoding shRNAs that specifically target Naip2 or con- 
trols. Bone marrow cells were differentiated into macrophages by culture in media 
containing macrophage colony-stimulating factor (MCSF). On day 4 of culture, 
transduced macrophages were selected by addition of puromycin (5 pg ml’). On 
day 8 of culture, macrophages were re-plated and infected the next day with 
Listeria monocytogenes or S. enterica Typhimurium expressing flagellin or PrgJ°, 
and inflammasome activation was measured by assaying release of the cytosolic 
enzyme lactate dehydrogenase (LDH)’ or by western blotting for processed (p10) 
CASP1, 

Reconstituted inflammasome. The inflammasome was reconstituted by trans- 
fection of 293T cells with MSCV2.2-IRES-GFP-based expression vectors encoding 
various mouse (C57BL/6-derived) Naip genes, Nirc4 and caspase 1. Inflamma- 
some oligomerization was assessed in digitonin (1%) lysates using a Bis-Tris 
NativePAGE system (Invitrogen) followed by western blotting. 

Statistical analysis. Statistical differences were calculated with an unpaired two- 
tailed Student’s t-test using GraphPad Prism 5.0b. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Mice. C57BL/6J (B6) mice were purchased from Jackson Labs and bred at UC 
Berkeley. Naip5 ‘~ mice on a pure B6 background were described previously’. 
Animal experiments were approved by the UC Berkeley Animal Care and Use 
Committee. 

Cell culture. HEK293T cells were grown in complete media (DMEM, 10% FBS, 
100 units ml” penicillin, 100 pg ml~' streptomycin, 2mM 1-glutamine). Bone 
marrow macrophages were differentiated from bone-marrow-derived precursor 
cells using macrophage colony stimulating factor as described previously’. 
Transient transfections. HEK293T cells were plated at 8 X 10° cells per well in 
6-well plates, transfected the next day using Lipofectamine 2000, and collected for 
flow cytometric analysis 48 h later. 

Measurement of cell death. Cell death of HEK293T cells was measured by flow 
cytometry measuring GFP and 7AAD fluorescence. Cells were stained with 7AAD 
(BD Pharmingen) according to the manufacturer’s instructions. Death of immor- 
talized macrophages in response to L. pneumophila and LEn-FlaA was measured by 
following release of the intracellular enzyme lactate dehydrogenase (LDH) as 
described previously’. Infection of immortalized macrophages with L. pneumophila 
was performed using a multiplicity of infection (MOI) of 2, and cell death was 
measured 4h later as described previously’. LFn-FlaA is a recombinant 6X His- 
tagged fusion protein encoding the first (non-enyzmatically active) 263 amino acids 
of lethal factor from Bacillus anthracis fused to full-length flagellin (FlaA) from 
L. pneumophila (J. von Moltke and R.E.V., unpublished). LFn-FlaA and B. anthracis 
protective antigen (PA) were purified from E. coli as described previously”®. 
Endotoxin was removed from these proteins using Detoxi-Gel (Pierce) according 
to the manufacturer’s protocol. 

Western blotting and native PAGE. Blue native gel electrophoresis was per- 
formed using the Bis-Tris NativePAGE system by Invitrogen according to the 
manufacturer’s instructions. Briefly, HEK293T cells were transfected for 48h, 
washed twice with cold PBS, trypsinized for 3 min at 37°C, re-suspended in 
complete media, and pelleted by centrifugation at 400g for 5 min at 4°C. Cell 
pellets were washed twice with cold PBS, followed by re-suspension in 1% digito- 
nin native lysis buffer (50mM BisTris, 50 mM NaCl, 10% w/v glycerol, 0.001% 
Ponceau S, 1% digitonin, 2mM Na3VO,, 1mM PMSF, 25mM NaF, 1x Roche 
protease inhibitor cocktail (no EDTA), pH7.2). Cell lysates were triturated and 
incubated on ice for 30 min, and then insoluble cell debris was pelleted by cent- 
rifugation at 16,000g for 30 min at 4 °C. Lysates were quantified for total protein 
using the BCA protein assay (Pierce), equalized for total protein, and separated by 
NativePAGE using the Novex BisTris gel system according to the manufacturer’s 
instructions (Invitrogen). Native gels were soaked in 10% SDS for 5 min before 
transfer to PVDF membrane (Millipore) and conventional western blotting. 
Antibodies used were anti-mIL-1f (R&D systems), anti-caspase 1 p10 (M20) 
(Santa Cruz), anti-NLRC4 (gift of S. Mariathasan and V. Dixit, Genentech), 
anti-NAIP5(961-978)"*, anti-Myc (9E10) (Clontech), anti-B-actin (C4) (Santa 
Cruz), anti-GFP (JL-8) (Clontech), anti-NAIP2(33-46)", anti-rabbit IgG-HRP 
and anti-mouse IgG-HRP (GE Healthcare), anti-goat IgG-HRP (Santa Cruz). In 
some cases, native gels were subjected to a second dimension of electrophoresis 
using SDS-PAGE. A 5.7-cm slice (lane) of natively resolved gel was placed in a 
dish containing 1x Laemmli sample buffer (50mM Tris-Cl (pH 6.8), 100 mM 
DTT, 2% (w/v) SDS, 0.1% bromophenol blue, 10% (v/v) glycerol) for 10 min, 
microwaved on high for 20s, and rocked for another 5 min before loading slice 
into the well of a precast 2D 4-12% SDS-PAGE gel (Invitrogen). For immuno- 
precipitations, digitonin cell extracts (100 j1g total protein) were pre-cleared with 
25 ul of washed Protein-G-sepharose (GE Healthcare), and then cleared extracts 
were incubated with 1 1g primary antibody (or isotype controls) overnight at 4 °C, 
and complexes were captured the following day with 25 ul of washed Protein-G- 
sepharose. Bound proteins were eluted by re-suspension in Laemmli sample buf- 
fer, boiled for 5 min and separated by SDS-PAGE. 

Expression constructs. NAIP5 wild-type and mutant constructs were cloned into 
the MSCV2.2 retroviral vector containing an IRES-GFP downstream of the mul- 
tiple cloning site. Expression is driven from the viral LTR and is considerably lower 
than that driven by the CMV promoter (data not shown). In general, wild-type and 
mutant ORFs were amplified between flanking BamHI and Not! sites, anda Kozak 
sequence (GCCACC) was engineered to precede the start codon. The BamHI/NotI 
digested PCR product was cloned into complementary BglII/Notl digested 
MSCV2.2 vector. Wild-type NAIP5(1-1402) and NAIP5(APloop) (A464-487) 
were amplified from pcDNA3 using forward (AAAAGGATCCGCCACCATGGC 
TGAGCATGGGGAGTCCTCCG) and reverse (AAAAGCGGCCGCTTACTCC 
AGGATAACAGGAGAGAATGGGAC) primers. NAIP5(A347) (347-1402) was 
generated using the forward primer (AAAAGGATCCGCCACCATGACC 
TTGAAGTCCTCTGCAGAAG) in combination with wild-type NAIP5 reverse 
primer. NAIP5(ALRR) (1-1039) was PCR cloned into BglII/Pmel sites of 
MSCV2.2 vector using the wild-type NAIP5 forward primer (see above), and 
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reverse primer (GITTAAACTCAGCCACTGCTGTTGAATAAACG). Wild- 
type NLRC4 (1-1024) was PCR amplified from pCDNA3 using forward 
(AAAAGGATCCGCCACCATGAACTTTATAAGGAACAACAGACG) and reverse 
(TTTTGCGGCCGCTTAAGCAGTCACTAGTTTAAAGGTGCC) primers, and 
ligated into BglII/Not! sites of MSCV2.2-IRES-GFP. NLRC4(K175R) was generated 
by site-directed mutagenesis using the QuickChange protocol (Stratagene) using 
forward (GAGTCTGGCAAAGGGCGATCGACCCTGCTGCAG) and reverse 
(CTGCAGCAGGGTCGATCGCCCTTTGCCAGACTC) primers. NLRC4(ACARD) 
(89-1024) was PCR amplified from pCDNA3 using forward primer (AAAA 
GGATCCGCCACCATGTCTTATCAGGTCACAGAAGAAGACC) paired with 
the wild-type reverse primer. NLRC4(ALRR) (1-656) was PCR amplified from 
pCDNA3 using the reverse primer (TTTTGCGGCCGCTTACTTCCAGTTG 
AAGAACAAAGACACAGC) in combination with the wild-type forward primer. 
Wild-type NAIP2 was PCR cloned from pBluescript(SK-) into the BglII/Not!I sites 
of MSCV2.2-IRES-GFP using forward (AAAAGGATCCGCCACCATGGCAGC 
CCAGGGAGAAGCCGTTGAGG) and reverse (TTTTGCGGCCGCTCACTTC 
TGAATGACAGGAGAGAATGGCACTACCC) primers. Wild-type NAIP6 was 
PCR cloned from pBluescript(SK-) into the BglII/Not!I sites of Mscv2.2-IRES-GFP 
using the wild-type NAIP5 forward primer (there is 100% conservation of the first 
26 nucleotides among NAIP1, NAIP5 and NAIP6) and the reverse primer 
(TITTGCGGCCGCTTACTCCAGGACAACAGGAGAGAACGGGAC).  Wild- 
type NAIP1 was PCR cloned from pBluescript(SK-) into the BglII/Notl sites of 
MSCV2.2-IRES-GFP using the wild-type NAIP5 forward primer, and the wild-type 
NAIP6 reverse primer (NAIP1 and NAIP6 share identical C-terminal nucleotides). 

MSCV2.2-caspase 1 and MSCV2.2-caspase 1(C284A) have been described 
previously’. 

We modified the MSCV2.2 vector to contain a 6X-Myc-tag in the MCS to 
facilitate generation of NH)-terminal fusion proteins (6x -Myc-MSCV2.2-IRES- 
GFP). The 6X-Myc-tag from pCS-6-Myc-SEC24 was PCR amplified using 
forward (AAAAAGATCTATCGATTTAAAGCTATG) and reverse (TTTTGC 
GGCCGCTGG-CCGGCCTGAATTCA) primers for insertion into the BglII site 
of MSCV2.2. 6X-Myc-FlaA was generated by PCR amplifying full-length FlaA 
(L. pneumophila) from MSCV2.2-FlaA’ using forward (AAAAGCGGCCGCAG 
CTCAAGTAATCAACACTAATGTGGC) and reverse (TTTTGTCGACTATC- 
GACCTAACAAAGATAATAC) primers and inserted into Notl/Sall sites. 
6X-Myc-PrgJ was generated by amplifying PrgJ from MSCV2.2-PrgJ using 
forward (AAAAGCGGCCGCATCGATTGCAACTATTGTCCC) and reverse 
(TTTTGTCGACTCATGAGCGTAATAGCGTTTC) primers and insertion into 
NotI/Sall sites. All constructs were fully sequenced to confirm their identity. 
Sequencing primers used for NAIP5: MSCV2.2-F, AAGCCCTTTGTA 
CACCCTAAGCC, MSCV2.2-R, CCTCACATTGCCAAAAGAC; NAIP5seq#1, 
CAGCAAAAGCACTGAACGCC; NAIP5seq#2, ATGAACAAATCCCTCGTAGC; 
NAIP5seq#3, TCACTCCTACCCAAGTCCAC; NAIP5seq#4, CTCAGACACA- 
CTTCACTAATGC; NAIP5seq#5, TCCCTTAGTTCCATCACACC; NAIP5seq#6, 
GACCCCTCTCTTTGTAGCAG; NAIP5seq#7, GAGTTTCTTGCTGCCGTGAG; 
NAIP5seq#8, TTAGAGGGTTGT-GGCTGGTGTC; NAIP5seq#9, CTTCACA 
GAGTATTGAGTTCCG;, NAIP5seq#10, © TTGAGTTTTCTGGACGATGC; 
NAIP5seq#11, GGACAACTTGCCAAACCTAC. Sequencing primers for NAIP2: 
NAIP2seqF1, TGGTGATGAGAAAGAGTCAC; NAIP2seqF2, CTTCACAGAGT 
ATTGAGTTCCG; NAIP2seqR1, AGCAAATGGTCAGTGCCGAG; NAIP2seqR2, 
ACATACTGCTGCCACGAAG; NAIP2seqR3, AATCCAGTGTTCTCCCTCG; 
MSCV2.2-F and -R primers (see above). Sequencing primers for NAIP6: 
NAIP6seqF1, CAGAAAGCCTGTTACTGTTGAG; NAIP6seqR1, GATGGAACT 
AAGGGAGAGGTAG; NAIP6seqR2, TCTTGGTCTTCCTGCCTATC; MSCV2.2-F, 
MSCV2.2-R, NAIP5seq#7 (see above). 

Generation of Naip2 shRNA constructs. We used the lentiviral pLKO.1-TRC 
cloning vector (Addgene) to generate three vectors expressing three separate 
shRNAs targeting Naip2 (Naip2 shRNAs 1-3). Naip2 shRNA1 oligos, CCGGG 
CCATTGCCTTTCAACCTATACTCGAGTATAGGTTGAAAGGCAATGGCT 
TTTTG and AATTCAAAAAGCCATTGCCTTTCAACCTATACTCGAGTAT 
AGGTTGAAAGGCAATGGC. Naip2 shRNA2 oligos, CCGGCCATCCAGAAA 
CCTTGTTGTTCTCGAGAACAACAAGGTTTCTGGATGGTTTTTG and AA 
TTCAAAAACCATCCAGAAACCTTGTTGTTCTCGAGAACAACAAGGTTT 
CTGGATGG. Naip2 shRNA3 oligos, CCGGCTTTCAGTCTTGAAGAGACAA 
CTCGAGTTGTCTCTTCAAGACTGAAAGTTTTTG and AATTCAAAAACT 
TTCAGTCTTGAAGAGACAACTCGAGTTGTCTCTTCAAGACTGAAAG. We 
included pLKO.1-TRC control vector (Addgene ID#10879) or pLKO.1 scramble 
(Addgene ID#1864) as negative controls. 

Knockdown of NAIP2 in primary macrophages. Bone marrow was collected 
from C57BL/6 mice on day 0, and plated into one 15-cm plate in macrophage 
differentiation media (see above). Lentivirus encoding Naip2 shRNAs were generated 
according to the Addgene protocol. On day 2, primary bone marrow macrophages 
were collected, red blood cells were lysed, and cells were plated at 1 X 10° cells per 
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well in 6-well plates. Macrophages were spinfected with lentiviral particles at 32 °C, 
90 min, 1,258g, and placed ina 32 °C incubator for 48 h. On day 4, cells were collected 
in cold PBS, re-plated on 10-cm plates containing fresh media containing puromycin 
(5g ml~') for selection, and placed at 37°C. Puromycin containing media was 
replaced on day 6. On day 8, macrophages were collected, counted and seeded at 
1 X 10° cells per well in a 96-well plate, and infected the following day. In some 
experiments, NAIP2 was knocked down in v-myc/v-raf immortalized bone- 
marrow-derived macrophages that were previously generated’ by use of the J2 virus. 
Listeria infections. Flagellin-deficient strains of Listeria monocytogenes 10403S 
were generated that express the secreted fusion protein ActAN100-PrgJ (Listeria 
AflaA-pPrgJ) or ActAN100-FlaA (Listeria AflaA-pFlaA), under the control of an 
IPTG-inducible promoter, as previously described*. Macrophages were spinfected 
in antibiotic free media at 400g for 10 min at an MOI = 5, with or without IPTG 


(1 mM), and placed at 37 °C for 30 min. The media was then replaced with com- 
plete media containing gentamycin (10}1gml~') and IPTG, and supernatants 
were assayed for LDH assay 5.5h later. 

Salmonella infections. Salmonella enterica serovar Typhimurium strain LT2 or 
isogenic flagellin mutant (FliC” FIjB) was grown in 10 ml LB standing cultures at 
37°C overnight. The next morning, the cultures were diluted 1:100 in LB and 
grown for 4h (standing culture, 37 °C). Bacteria were added to macrophages at an 
MOI of 10-30, followed by centrifugation at 400g for 10 min. Gentamycin (25 pg 
ml’) was added after 1h to kill remaining extracellular bacteria. Caspase 1 
processing or LDH release was monitored as previously described’. 


26. Krantz, B. A. et al. A phenylalanine clamp catalyzes protein translocation through 
the anthrax toxin pore. Science 309, 777-781 (2005). 
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Although thousands of large intergenic non-coding RNAs (lincRNAs) have been identified in mammals, few have been 
functionally characterized, leading to debate about their biological role. To address this, we performed loss-of-function 
studies on most lincRNAs expressed in mouse embryonic stem (ES) cells and characterized the effects on gene 
expression. Here we show that knockdown of lincRNAs has major consequences on gene expression patterns, 
comparable to knockdown of well-known ES cell regulators. Notably, lincRNAs primarily affect gene expression in 
trans. Knockdown of dozens of lincRNAs causes either exit from the pluripotent state or upregulation of lineage 
commitment programs. We integrate lincRNAs into the molecular circuitry of ES cells and show that lincRNA genes 
are regulated by key transcription factors and that lincRNA transcripts bind to multiple chromatin regulatory proteins to 
affect shared gene expression programs. Together, the results demonstrate that lincRNAs have Key roles in the circuitry 


controlling ES cell state. 


The mammalian genome encodes many thousands of large non- 
coding transcripts’ including a class of ~3,500 lincRNAs identified 
using a chromatin signature of actively transcribed genes**. These 
lincRNA genes have been shown to have interesting properties, 
including clear evolutionary conservation’ >, expression patterns cor- 
related with various cellular processes”® and binding of key transcrip- 
tion factors to their promoters*®, and the lincRNAs themselves 
physically associate with chromatin regulatory proteins*’. Yet, it 
remains unclear whether the RNA transcripts themselves have bio- 
logical functions*"°. Few have been demonstrated to have phenotypic 
consequences by loss-of-function experiments®. As a result, the func- 
tional role of lincRNA genes has been widely debated. Various pro- 
posals include that lincRNA genes act as enhancer regions, with the 
RNA transcript simply being an incidental by-product*’, that 
lincRNA transcripts act in cis to activate transcription’’, and that 
lincRNA transcripts can act in trans to repress transcription’. 

We therefore sought to undertake systematic loss-of-function 
experiments on all lincRNAs known to be expressed in mouse embry- 
onic stem (ES) cells**. ES cells are pluripotent cells that can self-renew 
in culture and can give rise to cells of any of the three primary germ 
layers including the germ line’. The signalling’, transcriptional'’*"” 
and chromatin’*'**! regulatory networks controlling pluripotency 
have been well characterized, providing an ideal system to determine 
how lincRNAs may integrate into these processes. 

Here we show that knockdown of the vast majority of ES-cell- 
expressed lincRNAs has a strong effect on gene expression patterns 
in ES cells, of comparable magnitude to that seen for the well-known 
ES cell regulatory proteins. We identify dozens of lincRNAs that upon 
loss-of-function cause an exit from the pluripotent state and dozens of 
additional lincRNAs that, although not essential for the maintenance 
of pluripotency, act to repress lineage-specific gene expression pro- 
grams in ES cells. We integrate the lincRNAs into the molecular 
circuitry of ES cells by demonstrating that most lincRNAs are directly 


regulated by critical pluripotency-associated transcription factors and 
~30% of lincRNAs physically interact with specific chromatin regu- 
latory proteins to affect gene expression. Together, these results 
demonstrate a regulatory network in ES cells whereby transcription 
factors directly regulate the expression of lincRNA genes, many of 
which can physically interact with chromatin proteins, affect gene 
expression programs and maintain the ES cell state. 


lincRNAs affect global gene expression 


To perform loss-of-function experiments, we generated five lentiviral- 
based short hairpin RNAs (shRNAs)” targeting each of the 226 
lincRNAs previously identified in ES cells** (see Methods and Sup- 
plementary Table 1). These shRNAs successfully targeted 147 lincRNAs 
and reduced their expression by an average of ~75% compared to 
endogenous levels in ES cells (see Methods, Fig. 1a, Supplementary 
Fig. 1 and Supplementary Table 2). As positive controls, we generated 
shRNAs targeting ~50 genes encoding regulatory proteins, including 
both transcription and chromatin factors that have been shown to play 
critical roles in ES cell regulation'””®”*; validated hairpins were 
obtained against 40 of these genes (Supplementary Table 2). As nega- 
tive controls, we performed independent infections with lentiviruses 
containing 27 different shRNAs with no known cellular target RNA. 
We infected each shRNA into ES cells, isolated RNA after 4 days, 
and profiled their effects on global transcription by hybridization to 
genome-wide microarrays (Fig. 1a, see Methods). We used a stringent 
procedure to control for nonspecific effects due to viral infection, 
generic RNA interference (RNAi) responses, or ‘off-target’ effects. 
Expression changes were deemed significant only if they exceeded 
the maximum levels observed in any of the negative controls, showed 
a twofold change in expression compared to the negative controls, and 
had a low false discovery rate (FDR) assessed across all genes based on 
permutation tests (Fig. 1b, see Methods). This approach controls for 
the overall rate of nonspecific effects by estimating the number and 
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Figure 1 | Functional affects of lincRNAs. a, A schematic of lincRNA 
perturbation experiments. ES cells are infected with shRNAs, knockdown level 
is computed, the best hairpin is selected and profiled on expression arrays, and 
differential gene expression is computed relative to negative control hairpins. 
b, Example of a lincRNA knockdown. Top: genomic locus containing the 
lincRNA. Bottom: heat-map of the 95 genes affected by knockdown of the 
lincRNA, expression for control hairpins (red line) and expression for lincRNA 
hairpins (blue line) are shown. c, Distribution of number of affected genes upon 
knockdown of 147 lincRNAs (blue) and 40 well-known ES cell regulatory 
proteins (red). Points corresponding to five specific ES cell regulatory proteins 
are marked. 


magnitude of observed effects in the negative control hairpins, where 
all effects are nonspecific. 

For 137 of the 147 lincRNAs (93%), knockdown caused a signifi- 
cant impact on gene expression (Supplementary Table 3), with an 
average of 175 protein-coding transcripts affected (range: 20-936) 
(Fig. 1c, Supplementary Fig. 2 and Supplementary Table 4). These 
results were similar to those obtained upon knockdown of the 40 well- 
studied ES cell regulatory proteins: 38 (95%) showed significant 
effects on gene expression, with an average of 207 genes affected 
(range: 28 (for Dnmt3l) to 1,187 (for Oct4)) (Fig. 1c, Supplemen- 
tary Fig. 2 and Supplementary Table 4). Although some individual 
lincRNAs have been found to lead primarily to gene repression'*’’, we 
find that knockdown of the lincRNAs studied here largely led to 
comparable numbers of activated and repressed genes (Supplemen- 
tary Fig. 2 and Supplementary Table 4). To assess off-target effects 
further, we also profiled the effects of the second-best validated 
shRNA targeting 10 randomly selected lincRNA genes. In all cases, 
second shRNAs against the same target produced significantly similar 
expression changes (see Methods and Supplementary Table 5). These 
results indicate that the vast majority of lincRNAs have functional 
consequences on overall gene expression of comparable magnitude 
(in terms of number of affected genes and impact on levels) to the 
known transcriptional regulators in ES cells. 


lincRNAs affect gene expression in trans 
Following the observation that a few lincRNAs act in cis**”, some 
recent papers have claimed that most lincRNAs act primarily in 
cis*"!°, We found no evidence to support this latter notion: knockdown 
of only 2 lincRNAs showed effects on a neighbouring gene, only 13 
showed effects within a window of 10 genes on either side, and only 8 
showed effects on genes within 300 kb; these proportions are no greater 
than observed for protein-coding genes (Supplementary Fig. 3 and 
Supplementary Table 6). In short, lincRNAs seem to affect expression 
largely in trans. 

Our results contrast with a recent study that concluded that 
lincRNAs act in cis, based on the observation that knockdown of 7 


24,25 
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out of 12 lincRNAs affected expression of a gene within 300 kb''. The 
explanation seems to be that the threshold in the previous study failed 
to account for multiple hypothesis testing within the local region. 
Accounting for this, the effects on neighbouring genes are no greater 
than expected by chance and are consistent with our observations 
here (see Methods). 

Although some lincRNAs can regulate gene expression in cis 
determining the precise proportion of cis regulators requires more 
direct experimental approaches. We note that our results are consistent 
with observed correlations between lincRNAs and neighbouring 
genes”’®, which may represent shared upstream regulation”” or local 
transcriptional effects’®”’. In addition, the lincRNAs studied here 
should be distinguished from transcripts that are produced at enhancer 
sites*”, the function of which has yet to be determined. 


11,24,25 
> 


lincRNAs maintain the pluripotent state 


We next sought to investigate whether lincRNAs have a role in regu- 
lating the ES cell state. Regulation of the ES cell state involves two 
components: maintaining the pluripotency program and repressing 
differentiation programs'*. To determine whether lincRNAs have a 
role in the maintenance of the pluripotency program, we studied their 
effects on the expression of Nanog, a key transcription factor that is 
required to establish** and uniquely marks the pluripotent state””°. 
We infected ES cells carrying a luciferase reporter gene expressed 
from the endogenous Nanog promoter*’ with shRNAs targeting 
lincRNAs or protein-coding genes. We monitored loss of reporter 
activity after 8 days relative to 25 negative control hairpins across 
biological replicates (see Methods). To ensure that the observed 
effects were not simply due to a reduction in cell viability, we excluded 
shRNAs that caused a reduction in cell numbers (see Methods, 
Supplementary Fig. 4 and Supplementary Table 7). Altogether, we 
identified 26 lincRNAs that had major effects on endogenous 
Nanog levels with many at comparable levels to the knockdown of 
the known protein-coding regulators of pluripotency such as Oct4 
and Nanog (Fig. 2a and Supplementary Table 7). This establishes that 
these lincRNAs have a role in maintaining the pluripotent state. 

To validate further the role of these 26 lincRNAs in regulating the 
pluripotent state, we knocked down these lincRNAs in wild-type ES 
cells and measured mRNA levels of pluripotency marker genes Oct4 
(also called Pou5Sf1), Sox2, Nanog, KIf4 and Zfp42 after 8 days. In all 
cases we observed a significant reduction in the expression of multiple 
pluripotency markers with >90% showing a significant decrease in 
both Oct4 and Nanog levels (Supplementary Fig. 5 and Supplementary 
Tables 8 and 9). To control for off-target effects, we studied additional 
hairpins targeting these lincRNAs. For 15 lincRNAs we had an effec- 
tive second hairpin. In all 15 cases, the second hairpin produced 
comparable reductions in Oct4 expression levels, showing that the 
observations were not due to off-target effects (Fig. 2b and Sup- 
plementary Table 10). Notably, >90% of lincRNA knockdowns 
affecting Nanog reporter levels led to loss of ES cell morphology 
(Fig. 2c and Supplementary Figs 6 and 7). Thus, inhibition of these 
26 lincRNAs lead to an increased exit from the pluripotent state. 


lincRNAs repress lineage programs 


To determine if lincRNAs act in repressing differentiation programs 
we compared the overall gene expression patterns resulting from 
knockdown of the lincRNAs to published gene expression patterns 
resulting from induced differentiation of ES cells**** and assessed 
significance using a permutation-derived FDR** (see Methods). 
These states include differentiation into endoderm, ectoderm, 
neuroectoderm, mesoderm and trophectoderm lineages. As a positive 
control for our analytical method, we confirmed the expected results 
that the expression pattern caused by Oct4 knockdown was strongly 
associated with the trophoectoderm lineage* and the pattern caused 
by Nanog knockdown was strongly associated with endoderm differ- 
entiation”® (Fig. 3a). 
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Figure 2 | lincRNAs are critical for the maintenance of pluripotency. 

a, Activity from a Nanog promoter driving luciferase, following treatment with 
control hairpins (black) or hairpins targeting luciferase (green), selected 
protein-coding regulators (red), and lincRNAs (blue). b, Relative mRNA 
expression levels of Oct4 after knockdown of selected protein-coding (red) and 
lincRNA (blue) genes affecting Nanog-luciferase levels. The best hairpin 
(Hairpin 1) and second best hairpin (Hairpin 2) are shown. All knockdowns are 
significant with a P-value <0.01. Error bars represent standard error (n = 4). 
c, Morphology of ES cells and immunofluorescence staining of Oct4 for a 
negative control hairpin (black line) and hairpins targeting Oct4 (red line), and 
two lincRNAs (blue line). The first row shows bright-field images, the second 
row shows immunofluorescence staining of the Oct4 protein, and the third row 
shows DAPI staining of the nuclei. 


Using this approach, we identified 30 lincRNAs for which knock- 
down produced expression patterns similar to differentiation into spe- 
cific lineages (Supplementary Table 11). Among these lincRNAs, 13 
are associated with endoderm differentiation, 7 with ectoderm differ- 
entiation, 5 with neuroectoderm differentiation, 7 with mesoderm 
differentiation and 2 with the trophectoderm lineage (Fig. 3a). 
Consistent with these functional assignments, we observed that most 
(>85%) of the 30 lincRNAs associated with specific differentiation 
lineages showed upregulation of the well-known marker genes for 
the identified states'’** upon knockdown (such as Sox17 (endoderm), 
Fef (ectoderm), Pax6 (neuroectoderm), brachyury (mesoderm) and 
Cdx2 (trophectoderm)) (Fig. 3b, Supplementary Figs 8 and 9 and 
Supplementary Tables 12 and 13). 
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Figure 3 | lincRNAs repress specific differentiation lineages. a, Expression 
changes for each lincRNA compared to gene expression of five differentiation 
patterns. Each box shows significant positive association (red, FDR <0.01) for 
Oct4 and Nanog (left) and for lincRNAs (right). b, Expression changes upon 
knockdown of Oct4 and Nanog (black bars) and representative lincRNAs (grey 
bars) for five lineage marker genes. The expression changes (FDR <0.05) are 
displayed on a log scale as the t-statistic compared to a panel of negative control 
hairpins. 


The fact that knockdown of these 30 lincRNAs induces gene 
expression programs associated with specific early differentiation 
lineages indicates that these lincRNAs normally are a barrier to such 
differentiation. Interestingly, most of the lincRNA knockdowns 
(~85%) that induce gene expression patterns associated with these 
lineages did not cause the cells to differentiate as determined by 
Nanog reporter levels (Supplementary Table 7) and Oct4 expression 
(Supplementary Fig. 10). This is consistent with observations for 
several critical ES cell chromatin regulators, such as the polycomb 
complex; loss-of-function of these regulators similarly induces 
lineage-specific markers without causing differentiation'****’. 

Together, these data indicate that many lincRNAs have important 
roles in regulating the ES cell state, including maintaining the pluripo- 
tent state and repressing specific differentiation lineages. 


lincRNAs are targets of ES cell transcription factors 


Having demonstrated a functional role for lincRNAs in ES cells, we 
sought to integrate the lincRNAs into the molecular circuitry control- 
ling the pluripotent state. First, we explored how lincRNA expression 
is regulated in ES cells. Towards this end, we used published genome- 
wide maps of 9 pluripotency-associated transcription factors'®*® 
and determined whether they bind to the promoters of lincRNA 
genes. Of the 226 lincRNA promoters ~75% are bound by at least 1 
of 9 pluripotency-associated transcription factors (including Oct4, 
Sox2, Nanog, c-Myc, n-Myc, K1f4, Zfx, Smad and Tcf3) with a median 
of 3 factors bound to each promoter (Fig. 4a, Supplementary Fig. 11 
and Supplementary Table 14), comparable to the proportion reported 
for protein-coding genes’”*. Interestingly, the three core factors (Oct4, 
Sox2 and Nanog) bind to the promoters of ~12% of all ES cell 
lincRNAs and ~50% of lincRNAs involved in the regulation of the 
pluripotent state. 

To determine iflincRNA expression is functionally regulated by the 
pluripotency-associated transcription factors, we used shRNAs to 
knockdown the expression of 5 of the 9 pluripotency-associated tran- 
scription factor genes for which we could obtain validated hairpins 
and profiled the resulting changes in lincRNA expression after 4 days. 
Upon knockdown of a transcription factor, ~50% of lincRNA genes 
whose promoters are bound by the transcription factor exhibit 
expression changes (Fig. 4a); this proportion is comparable to that 
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Figure 4 | lincRNAs are direct regulatory targets of the ES cell 


transcriptional circuitry. a, A heat-map representing ChIP-Seq enrichments 
for nine transcription factors (columns) at lincRNA promoters (rows). The 
percentage of bound lincRNAs downregulated upon knockdown of the 
transcription factor is indicated in boxes. NA, not measured. Right: examples of 
lincRNAs from two clusters (‘core regulated’ and ‘Myc regulated’) showing 
their genomic neighbourhood and transcription factor binding. b, Left: a heat- 
map representing changes in lincRNA expression (rows) after knockdown of 11 
transcription factors (columns). Middle: effect of knockdown of Sox2, Oct4 and 
Nanog on expression levels of linc1405 (grey) and Oct4 (black). Right: effect of 
knockdown of KIf2, Klf4, n-Myc and Esrrb on expression levels of linc1428. 


seen for protein-coding genes whose promoters are bound by the 
transcription factor (Supplementary Fig. 12). The strong but imper- 
fect correlation between transcription-factor-binding and effect of 
transcription-factor knockdown is consistent with previous observa- 
tions and may reflect regulatory redundancy in the pluripotency 
network”. In addition, we profiled the knockdown of an additional 
7 pluripotency-associated transcription factors (including Esrrb, 
Zfp42 and Stat3). Altogether, for ~60% of the ES cell lincRNAs, we 
identified a significant downregulation upon knockdown of 1 of these 
11 transcription factors (Fig. 4b and Supplementary Table 15). 

After retinoic-acid-induced differentiation of ES cells, the ES cell 
lincRNAs show temporal changes across the time course with ~75% 
showing a decrease in expression compared to untreated ES cells 
(Supplementary Fig. 13 and Supplementary Table 16). Notably, all of 
the lincRNAs shown to regulate pluripotency are downregulated upon 
retinoic acid treatment (Supplementary Fig. 13). Our results establish that 
lincRNAs are direct transcriptional targets of pluripotency-associated 
transcription factors and are dynamically expressed across differenti- 
ation. Collectively, these results demonstrate that lincRNAs are an 
important regulatory component within the ES cell circuitry. 


lincRNAs bind diverse chromatin proteins 

To explore how lincRNAs carry out their regulatory roles, we studied 
whether lincRNAs physically associate with chromatin regulatory 
proteins in ES cells. We previously showed that many human 
lincRNAs can interact with the polycomb repressive complex*, a com- 
plex that has a critical functional role in the regulation of ES cells'*””. 
To determine whether the ES cell lincRNAs physically associate with 
the polycomb complex, we crosslinked RNA-protein complexes using 
formaldehyde, immunoprecipitated the complex using antibodies 
specific to both the Suz12 and Ezh2 components of polycomb, and 
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profiled the co-precipitated lincRNAs using a direct RNA quantifica- 
tion method*' (see Methods). We performed immunoprecipitation of 
the polycomb complex across five biological replicates and eight mock- 
IgG controls, and we assessed significance using a permutation test (see 
Methods and Supplementary Fig. 16). Altogether, we identified 24 
lincRNAs (~ 10% of the ES cell lincRNAs) that were strongly enriched 
for both polycomb components (Fig. 5b and Supplementary Table 17). 

To determine if lincRNAs interact with additional chromatin 
proteins, we systematically analysed chromatin-modifying proteins 
that have been shown to have critical roles in ES cells'*?'*. 
Specifically, we screened antibodies against 28 chromatin complexes 
(see Methods, Supplementary Fig. 14 and Supplementary Table 18) 
and identified 11 additional chromatin complexes that are strongly 
and reproducibly associated with lincRNAs (see Methods and Sup- 
plementary Figs 15 and 16). These chromatin complexes are involved 
in ‘reading’ (Prc1, Cbx1 and Cbx3), ‘writing’ (Tip60/P400, Prc2, Setd8, 
Eset and Suv39h1) and ‘erasing’ (Jarid1b, Jaridlc, and Hdac1) histone 
modifications, as well as a chromatin-associated DNA binding protein 
(Yy1) (Fig. 5a). Altogether, we found that 74 (~30%) of the ES cell 
lincRNAs are associated with at least 1 of these 12 chromatin com- 
plexes (Fig. 5b and Supplementary Table 17). Although most of the 
identified interactions are with repressive chromatin regulators, this is 
probably due to limitations of our selection criteria and available 
antibodies. 

Many lincRNAs are strongly associated with multiple chromatin 
complexes (Fig. 5b). For example, we identified 8 lincRNAs that bind 
to the Prc2 H3K27 and Eset H3K9 methyltransferase complexes (writers 
of repressive marks) and the Jaridlc H3K4 demethylase complex (an 
eraser of activating marks). Consistent with this, the Prc2 and Eset 
complexes have been reported to bind at many of the same ‘bivalent’ 
domains” and to associate functionally with the Jaridlc complex”. 
Similarly, we identified a distinct set of 17 lincRNAs that bind to the 
Prc2 complex (‘writer’ of K27 repressive marks), Prcl complex (‘reader’ 
of K27 repressive marks) and Jarid1b complex (‘eraser’ of K4 activating 
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Figure 5 | lincRNAs physically interact with chromatin regulatory proteins. 
a, A schematic of the classes of chromatin regulators profiled: readers (blue), 
writers (orange) and erasers (green). b, A heat-map showing the enrichment of 
74 lincRNAs (rows) for 1 of 12 chromatin regulatory complexes (columns). The 
names are colour-coded by chromatin-regulatory mechanism. Major clusters 
are indicated by vertical lines with a description of the chromatin components. 
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marks) (Fig. 5b), as well as other functionally consistent reader, writer 
and eraser combinations (Supplementary Fig. 17). One of several poten- 
tial models consistent with these data are that lincRNAs may bind to 
multiple distinct protein complexes, perhaps serving as ‘flexible scaf- 
folds’ to bridge functionally related complexes as previously described 
for telomerase RNA“. 

To determine if the identified lincRNA-protein interactions have a 
functional role, we examined the effects on gene expression resulting 
from knockdown of individual lincRNAs that are physically asso- 
ciated with particular chromatin complexes and from knockdown 
of genes encoding the associated complex itself (see Methods). For 
>40% of these lincRNA~-protein interactions, we identified a highly 
significant overlap in affected gene expression programs compared to 
just ~6% for random lincRNA-protein pairs (see Methods and 
Supplementary Table 19). Other cases may reflect the limited power 
to detect the overlaps, because specific lincRNA-protein complexes 
may be related to only a fraction of the overall expression pattern 
mediated by the chromatin complex. 

Together, these data demonstrate that many ES cell lincRNAs phys- 
ically associate with multiple different chromatin regulatory proteins 
and that these interactions are probably important for the regulation of 
gene expression programs. 


Discussion 

Although the mammalian genome encodes thousands of lincRNA 
genes, few have been functionally characterized. We performed an 
unbiased loss-of-function analysis of lincRNAs expressed in ES cells 
and show that lincRNAs are clearly functional and primarily act in trans 
to affect global gene expression. We establish that lincRNAs are key 
components of the ES cell transcriptional network that are functionally 
important for maintaining the pluripotent state, and that many are 
downregulated upon differentiation. The ES cell lincRNAs physically 
interact with chromatin proteins, many of which have been previously 
implicated in the maintenance of the pluripotent state’*””*". In addition 
to chromatin proteins, lincRNAs interact with other protein complexes 
including many RNA-binding proteins (data not shown). 

Our data suggest a model whereby a distinct set of lincRNAs is 
transcribed in a given cell type and interacts with ubiquitous regula- 
tory protein complexes to form cell-type-specific RNA-protein com- 
plexes that coordinate cell-type-specific gene expression programs 
(Fig. 6). Because many of the lincRNAs studied here interact with 
multiple different protein complexes, they may act as cell-type- 
specific ‘flexible scaffolds* to bring together protein complexes into 
larger functional units (Fig. 6). This model has been previously 
demonstrated for the yeast telomerase RNA“ and suggested for the 
XIST* and HOTAIR™* lincRNAs. The hypothesis that lincRNAs serve 
as flexible scaffolds could explain the uneven patterns of evolutionary 
conservation seen across the length of lincRNA genes*: the more 
highly conserved patches could correspond to regions of interaction 
with protein complexes. 

Although a model of lincRNAs acting as ‘flexible scaffolds’ is attrac- 
tive, it is far from proven. Testing the hypothesis for lincRNAs will 
require systematic studies, including defining all protein complexes 
with which lincRNAs interact, determining where these protein 
interactions assemble on RNA, and ascertaining whether they bind 
simultaneously or alternatively. Moreover, understanding how lincRNA- 
protein interactions give rise to specific patterns of gene expression will 
require determination of the functional contribution of each inter- 
action and possible localization of the complex to its genomic targets. 


METHODS SUMMARY 

RNAi expression effects. We cloned five shRNAs targeting each lincRNA into a 
puromycin-resistant lentiviral vector”. ES cells were plated on pre-gelatinized 96- 
well plates and infected with lentivirus before addition of irradiated DR4 mouse 
embryonic fibroblasts (MEFs). Media containing 1gml~' puromycin was 
added 24h after infection. On-target knockdown was assessed after 4 days and 
the best hairpin showing a knockdown >60% was selected. RNA from 147 
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Figure 6 | A model for lincRNA integration into the molecular circuitry of 
the cell. ES-cell-specific transcription factors (such as Oct4, Sox2 and Nanog) 
bind to the promoter of a lincRNA gene and drive its transcription. The 
lincRNA binds to ubiquitous regulatory proteins, giving rise to cell-type- 
specific RNA-protein complexes. Through different combinations of protein 
interactions, the lincRNA-protein complex can give rise to unique 
transcriptional programs. Right: a similar process may also work in other cell 
types with specific transcription factors regulating lincRNAs, creating cell- 
type-specific RNA-protein complexes and regulating cell-type-specific 
expression programs. 


lincRNAs, 40 protein-coding genes and 27 negative controls were hybridized to 
Agilent microarrays. Differentially expressed genes were defined as having an 
FDR <5% and fold-change >2-fold compared to controls. 

Screening for pluripotency effects. Nanog-luciferase ES cells*' were infected and 
measured after 8 days. Hits were identified if they reduced luciferase levels 
(z< —6) across all replicates and did not reduce AlamarBlue levels. Hits were 
validated in wild-type ES cells by measuring mRNA levels of Oct4, Nanog, Sox2, 
Kif4 and Zfp42. Oct4 expression was assessed using immunofluorescence staining 
and morphology was visually assessed. 

Lineage expression effects. Lineage expression programs were defined using 
published data sets (Gene Expression Omnibus GSE12982, GSE11523, and 
GSE4082) and curated gene expression signatures”. Overlaps in gene expres- 
sion effects were assessed using a modified GSEA™. Expression changes in lineage 
markers were determined using qPCR. 

Transcription factor binding and regulation. ChIP-Seq data was downloaded 
(GSE11724 and GSE11431), aligned and analysed. lincRNA promoters were previ- 
ously defined using H3K4me3 peaks*. Changes in expression of the lincRNAs upon 
knockdown of the transcription factors were analysed using Agilent microarrays. 
Chromatin binding and overlap in expression. ES cells were crosslinked with 
formaldehyde, lysed, immunoprecipitated, washed and reverse crosslinked. RNA 
was hybridized to the Nanostring code set. We tested antibodies for 28 chromatin 
complexes and selected successful antibodies that had >10 lincRNAs exceeding a 
fivefold change and had significant enrichments across 3 replicates. We com- 
pared the overlap in gene expression using a modified GSEA*™*. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

ES cell culture. V6.5 (genotype 129SvJae X C57BL/6) and Nanog-luciferase*! ES 
cells were co-cultured with irradiated C57BL/6 MEFs (GlobalStem; GSC-6002C) 
on pre-gelatinized plates as previously described’’. Briefly, cells were cultured in 
mES media consisting of knockout DMEM (Invitrogen; 10829018) supplemented 
with 10% FBS (GlobalStem; GSM-6002), 1% penicillin-streptomycin (Invitrogen; 
15140-163), 1% L-glutamine (Invitrogen; 25030-164), 0.001% B-mercaptoethanol 
(Sigma; M3148-100ML) and 0.01% ESGRO (Millipore; ESG1106). 

Picking lincRNA gene candidates. Using our previous catalogue of K4-K36 
defined lincRNAs’ along with the reconstructed full-length sequences we deter- 
mined using RNA-Seq’, we designed shRNA hairpins targeting each lincRNA 
identified in both sets. Specifically, we used the conservative K4-K36 definitions 
from our previous work’ that were expressed in mouse ES cells. We further 
filtered the list to include only multi-exonic lincRNAs that were reconstructed 
in mouse ES cells’. Together, this yielded 226 lincRNA genes. 

Picking protein-coding gene candidates. We selected protein coding gene con- 
trols consisting of both transcription factors and chromatin proteins. These proteins 
were selected based on their well-characterized role in regulating mouse ES cells and 
include Oct4 (Pou5f1)**’, Sox2 (refs 17, 49) Nanog (refs 29, 30), Stat3 (ref. 50), K1f4 
(ref. 51) and Zfp42 (Rex1)*. In addition, we selected additional transcriptional and 
chromatin regulators that were identified by RNAi screens as regulators of pluripo- 
tency’””°”? and/or were found in smaller focused studies to have critical roles in the 
maintenance of the pluripotent state (such as Carm1] (ref. 53), Chd1 (ref. 54), Thap11 
(ref. 55), Suz12 (refs 18, 19, 36) and Setdb1 (refs 21, 56)). A full list is provided in 
Supplementary Table 2. 

shRNA design rules. For each lincRNA we designed five hairpins by extending 
the previously described design rules” accounting for the sequence content of the 
hairpin, miRNA seed matches, uniqueness to the target compared to the tran- 
scriptome and the genome, and number of lincRNA isoforms covered. 

For each lincRNA we enumerated all 21-mer sub-sequences and scored them 
as follows: (1) a ‘clamp score’ was computed by looking at the nucleotides at 
positions 18, 19 and 20. If all three positions contained an A/T it was assigned a 
score of 4, if two positions were A/T it was assigned a score of 1.5 and if one was 
A/T it was assigned a score of 0.8. We then looked at positions 16, 17, and 21; if all 
three were A/T it was assigned a score of 1.25, if two were A/T it was assigned a 
score of 1.1, and if one was A/T is was assigned a score of 0.8. The clamp score was 
computed as the product of these two scores. (2) A ‘GC score’ was computed by 
looking at the total GC percentage of the 21-mer sequence. If the sequence was 
<25% GC it was assigned a score of 0.01, if it was <55% it was assigned a score of 
3, if it was <60% it was assigned a score of 1, and if >60% it was assigned a score of 
0.01. (3) A “4-mer penalty’ of 0.01 was assigned for any hairpin containing the 
same nucleotide in 4 subsequent nucleotides. (4) A ‘7 GC penalty’ of 0.01 was 
assigned to any hairpin containing any 7 consecutive G/C nucleotides. (5) We 
removed all hairpins containing an A in either position 1 or position 2 of the 
hairpin. (6) We removed all hairpins containing a repeat masked nucleotide. (7) 
Finally, we computed a ‘miRNA-seed penalty’ by looking at the forward positions 
11-17, 12-20 and 13-19 of the hairpin as well as the reverse complement of 
positions 14-20, 15-21, or 16-21 plus a 3’ C. We then looked up whether these 
positions matched known miRNA seeds and with what frequency. We computed 
the scores for the forward and reverse positions and defined the score as the 
product of the forward and reverse scores. The final score for each hairpin 
sequence is defined as the product of all seven scores. 

We then sorted the candidate hairpin sequences by score, breaking high- 
scoring ties by the total number of lincRNA isoforms that are covered by the 
hairpin. We then aligned each hairpin sequence against both the genome and the 
RefSeq-defined transcriptome (NCBI Release 39), and filtered any hairpin with 
fewer than three mismatches to any other gene or position in the genome. 
Candidate sequences were chosen for shRNA production by first picking the 
highest scoring candidate and then proceeding to successively lower scores. As 
each hairpin was selected, all other hairpins overlapping this hairpin were 
removed. We repeated this process until we identified five hairpins that covered 
each lincRNA. 
shRNA cloning and virus prep. We designed 1,143 hairpins targeting 226 
lincRNA genes. Of these, we successfully cloned 1,010 hairpins targeting 214 
lincRNAs. These hairpins were cloned into a vector containing a puromycin 
resistance gene and incorporated into a lentiviral vector as previously described”. 
Briefly, synthetic double-stranded oligos that represent a stem-loop hairpin struc- 
ture were cloned into the second-generation TRC (the RNAi Consortium) 
lentiviral vector, pLKO.5; the expression of a given hairpin produces a shRNA 
that targets the gene of interest. Lentivirus was prepared as previously described”. 
Briefly, 100ng of shRNA plasmid, 100ng of packaging plasmid (psPAX2) 
and 10 ng of envelope plasmid (VSV-G) were used to transfect packaging cells 
(293T) with TransIT-LT1 (Mirus Bio). Virus was harvested 48 and 70h after 
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transfection. Two harvests were combined. Virus titres were measured as previ- 
ously described”. Briefly, we measured virus titres by infecting A549 cells with 
appropriately diluted viruses. Twenty-four hours after infection, puromycin was 
added to a final concentration of 5 ug ml! and the selection proceeded for 48 h. 
The number of surviving cells, which is correlated to virus titre, was measured by 
AlamarBlue (BioSource) staining using the Envision 2103 Multilabel plate reader 
(PerkinElmer). 

Infection and selection protocol. V6.5 ES cells or Nanog-luciferase ES cells were 
plated at a density of 5,000 cells per well (8-day time point) or 25,000 cells per well 
(4-day time point) in 100 pl mES media onto pre-gelatinized 96-well dishes 
(VWR; BD356689). Cells were infected with 5 ul of a lentiviral shRNA stock 
and incubated at 37°C for 30min. Puromycin-resistant DR4 MEFs 
(GlobalStem; GSC-6004G) were then added to the plates at a density of ~6,000 
cells per well and incubated overnight at 37 °C, 5% CO . After 24h, all media was 
removed from the cells and replaced with media containing 1 pg ml” ' puromycin. 
Media was then changed every other day with fresh media containing 1 jg ml! 
puromycin. The end-point depended on the assay and was either 4 days after 
infection (knockdown validation and microarrays) or 8days (reporters and 
qPCR of marker genes). 

RNA extraction. ES cells were infected and lysed at day 4 with 150 pil of Qiagen’s 
RLT buffer and three replicates of each virus plate were pooled for RNA extrac- 
tion using Qiagen’s RNeasy 96-well columns (74181). RNA extraction was com- 
pleted following Qiagen’s RNeasy 96-well protocol with the following 
modifications: 450 ul of 70% ethanol was added to 450 ul total lysate before the 
first spin. An additional RPE wash was added to the protocol, for a total of three 
RPE washes. 

lincRNA primer design and pre-screen. lincRNA primers were designed using 
primer3 (http://frodo.wi.mit.edu/primer3/). Specifically, we designed primers 
spanning exon-exon junctions by specifying each of the regions as preferred 
inclusion regions in the primer3 program. When a low-scoring primer pair 
(primer penalty <1) was available it was used. If none was available, we then 
identified all primers that contained amplicons that spanned an exon-exon junc- 
tion. In a few cases, when we could not identify a primer pair spanning an exon- 
exon junction, we designed primers within an exon of the lincRNA. For each 
primer pair, we tested the specificity against the transcriptome” (RefSeq NCBI 
Release 39) and the genome (Mouse MM9) using the isPCR (http://genome.ucsc. 
edu/cgi-bin/hgPcr) program. Specifically, we required that the primer pair amp- 
lify the lincRNA gene and no other genomic of gene amplicon. 

For each primer pair, we validated the quantification and specificity before use. 

Specifically, we tested primers in qPCR reactions using a dilution series of mouse 
ES cDNA including a no reverse transcriptase (RT) sample. We excluded any 
primer that did not have robust quantification across a 64-fold dilution curve, had 
high signal in the no RT sample, or had low detectable expression in the undiluted 
sample (cycle number >34). For primers that failed this validation we redesigned 
and tested new primers. 
Knockdown validation using qPCR. To determine if lincRNA hairpins were 
effective at knocking down the lincRNA of interest, we infected each hairpin into 
mouse embryonic stem cells, selected for lentiviral integration, and measured 
changes in the targeted lincRNA expression level. We isolated total cellular 
RNA after 4 days; this time point was chosen to allow for identification of robust 
changes while minimizing secondary effects due to differentiation of the ES cells. 
We reasoned that this would allow us to determine more direct effects due to 
RNAi rather than to differentiation. 

Gene panels were constructed that contained all five hairpins targeting a gene 
along with an empty vector control pLKO.5-nullT and the GFP-targeting hairpin 
clonetechGfp_437s1cl. cDNA was generated using 10 tl of RNA and 10 pl of 2x 
cDNA master mix containing 5X Transcriptor RT Reaction Buffer (Roche), 
DTT, MMLV-RT (Roche), dNTPs (Agilent; 200415-51), Random 9-mer oligos 
(IDT), Oligo-dT (IDT) and water. cDNA was diluted 1:9 and quantitative PCR 
was performed using 250nM of each primer in 2X Sybr green master mix 
(Roche) and run on a Roche Light-Cycler 480. Target lincRNA expression and 
Gapdh levels were computed for each panel. lincRNA expression levels were 
normalized by Gapdh levels and this normalized value was compared to the 
reference control hairpins within the panel. Knockdown levels were computed 
as the average of the fold decrease compared to the two control hairpins. Hairpins 
showing a knockdown greater than 60% of the endogenous level were considered 
validated and the best validated hairpin from a lincRNA panel was selected for 
microarray studies. 

Picking candidates for microarray analysis. To assess the effects of a lincRNA 
on gene expression, we profiled the changes in gene expression after knocking 
down each lincRNA gene. Specifically, for each lincRNA with at least one vali- 
dated hairpin we profiled the genome-wide expression level changes after knock- 
down across two independent infections (see above). To control for expression 
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changes due to viral infection, we performed five independent infections contain- 
ing no RNAihairpin (pLKO.5-nullT). This control hairpin was embedded in each 
RNA preparation plate. To control for effects due to an off-target RNAi effect, we 
profiled 27 distinct negative control hairpins which do not have a known target in the 
cell. These hairpins included 6 RFP hairpins, 10 GFP hairpins, 6 luciferase hairpins 
and 5 LacZ hairpins. These hairpins provide a measurement of the variability of the 
RNAi response triggered due to nonspecific effects. Furthermore, we profiled 
hairpins targeting 147 lincRNAs, including 10 with a second best hairpin, and 40 
protein-coding genes in biological replicate. The hairpins and their replicates were 
randomly distributed across 7 96 well plates and prepared in batches. Each RNA 
preparation batch contained one pLKO hairpin and one clonetechGfp_437slc1 
hairpin in a random location on the plate. To minimize batch effects, the plate 
locations of the biological replicates were scrambled and the positions within the 
plates were scrambled for all hairpins and replicates. 

Agilent microarray hybridization. Using Agilent’s One-Colour Quick Amp 
Labelling kit (5190-0442), we amplified and labelled total RNA for hybridization 
to prototype mouse lincRNA arrays (G4140-90040) according to manufacturer’s 
instructions with a few variations. The custom Agilent SurePrint G3 8x60K 
mouse array design used for this study (G4102A, AMADID 025725 G4852A) 
has probes to 21,503 Entrez genes and 2,230 lincRNA genes. A new updated 
version of this mouse design is commercially available that contains probes to 
34,017 Entrez gene targets as well as 2,230 lincRNA genes (G4825A). The CRNA 
samples were prepared by diluting 200 ng of RNA in 8.3 pl water and adding 
positive control one-colour RNA spike-in mix (Agilent, 5188-5282) that was 
diluted serially 1:20, then 1:25 and finally 1:10. We annealed the T7 promoter 
primer from the kit by incubating at 65°C for 10 min. We prepared the cDNA 
master mix and added it to the annealed RNA and incubated at 40°C for 2h, 
followed by 65 °C for 15 min. We prepared the cRNA transcription master mix 
and added it to the cDNA and incubated at 40 °C for 2 h protected from light. We 
purified the labelled cRNA using Qiagen’s RNeasy 96-well columns (Qiagen, 
74181) by adding 350 pl of Qiagen RLT (without BME) to the cRNA followed 
by the addition of 250 pil of 95% ethanol before applying to the plate column. After 
a4 min spin at 6,000 r.p.m., we washed the columns three times with 800 ul buffer 
RPE. We dried the columns by spinning for 10 min and eluted the cRNA with 
50 ul of water. We measured the cRNA yield and dye incorporation using the 
Nanodrop 8000 Microarray measurement setting. We mixed 600 ng of cRNA 
with blocking agent and fragmentation buffer (Agilent, 5190-0404) and fragmented 
for 30 min in the dark at 60 °C. We added 2 hybridization buffer to each sample 
and loaded 40 pil onto an 8-pack Hybridization gasket. We placed the microarray 
slides on top, sealed in the hybridization chamber, and incubated for 18 h at 65 °C. 
We washed the slides for 1 min in room temperature GE Wash Buffer 1 and then for 
1min in 37°C GE Wash Buffer 2 (Agilent 5188-5327, no triton addition). We 
scanned the microarrays using an Agilent Scanner C (G2565CA) using the follow- 
ing settings: dye channel = red & green, scan region = scan area (61 21.6 mm), 
scan resolution = 31m. We prepared all of the samples simultaneously using 
homogenous master mixes to limit variability. Fragmentation and hybridization 
was staggered over time in batches of 3 to 4 slides (24 to 32 samples). 

Array filtering, normalization and probe filtering. Each array was processed and 
data extracted using the Agilent feature extraction software (G4462AA, Version 
10.7.3). Samples were retained if they passed all the following quality control statistics: 
AnyColourPrentFeatNonUnifOL <1; eQCOneColourSpikeDetectionLimit >0.01 
and <2.0; Metric_absGE1E1laSlope between 0.9 and 1.2; Metric_gElaMedCV 
ProcSignal <8; gNegCtrlAveBGSubSig >—10 and <5; Metric_gNegCtrlAveNet 
Sig <40; gNegCtrlsDevBGSubSig <10; Metric_gNonCntrlMedCVProcSignal 
<8; Metric_gSpatialDetrendRMSFilterMinusFit <15; SpotAnalysis_PixelSkew 
CookiePct >0.8 and <1.2. 

Gene expression values were determined using the gProcessedSignal intensity 
values. Probes were flagged if they were not detectable well above background or 
had an expression level lower than the lowest detectable spike-in control value. The 
values were floored across all samples by taking the maximum of the minimum 
non-flagged values across all experiments. Any value less than this maximum value 
was set to the maximum. This conservatively eliminates any detection variability 
across the samples due to stringency or other array variables. 

The result of this is a single value for each probe per array. To normalize 
expression values across arrays, we performed quantile normalization as previ- 
ously described*. Briefly, we ranked each array from lowest to highest expression. 
For each rank, we computed the average expression and each experiment with 
this value at the associated rank. For each probe, we computed the difference 
between the second smallest expression value and the second largest expression 
value. If this difference was less than 2, we filtered the probe. This metric was 
chosen to eliminate bias due to single sample outliers. 

Identifying significant gene expression hits from RNAi knockdowns. To con- 
trol for effects due to nonspecific effects of shRNAs, we profiled 27 distinct 


negative control hairpins which do not have a known target in the cell. These 
hairpins provide a measurement of the variability of the expression profiles due to 
random variability or triggered by ‘off-target’ effects of the shRNA lentiviruses. 
Assuming that any observed effects in the negative control hairpins are due to off- 
target effects and observed effects in the targeting hairpins include a mix of both 
off-target effects and on-target effects, we use permutations of the negative 
controls to assign a FDR confidence level for being an on-target hit to each gene. 
As such, a gene would only reach genome-wide significance if the number of 
genes and scale of the effect was much larger than would be observed randomly 
among all of the expression changes found for the negative control hairpin. 
Specifically, for each gene we computed a t-statistic between shRNAs targeting 
the lincRNA and control shRNA samples. To assess the significance of each gene 
we permuted the sample and control groups retaining the relative sizes of the 
groups and computing the same t-statistic. We then assigned an FDR value to 
each gene by computing the average number of values in the permuted t-statistics 
that were greater than the observed value of interest and divided this by the 
number of all observed f-statistics that were greater than the observed value. 
We defined genes as significantly differentially expressed if the FDR was <5% 
and the fold-change compared to the negative controls was >2-fold. Using this 
approach, an effect would only reach a significant FDR if the scale is significantly 
larger than would be observed in the negative controls. Knockdown of a lincRNA 
was considered to have a significant effect on gene expression if we identified at 
least 10 genes that had an effect that passed all of the criteria. 
Gene-neighbour analysis. We identified neighbouring genes based on the RefSeq 
genome annotation®’ (NCBI Release 39). We excluded from analysis all RefSeq 
genes that corresponded to our lincRNA of interest but included all other coding 
and non-coding transcripts. We identified a significant hit as any lincRNA affect- 
ing a neighbour within 10 genes on either side with an FDR<0.05 and twofold 
expression change. To compute the closest affected neighbour, we classified all 
genes affected upon knockdown of the lincRNAs using the same criteria above. 
We computed the distance between each affected gene and the locus of the 
lincRNA gene (and protein-coding gene) that was perturbed and took the min- 
imum absolute distance across all affected genes. 
Analysis of expected number of neighbouring genes that will change 
by chance. To determine the expected number of differentially expressed 
‘neighbouring’ genes occurring by chance assuming that the knockdown has 
no effect on gene expression, we calculated the average number of genes in a 
300-kb window around a randomly selected gene in the human and mouse 
genome. We calculated this to be 11.2 (human) and 11.8 (mouse). For simplicity, 
we will conservatively round this down to 11. Assuming that no genes are 
changing between the knockdown and control, using a nominal P-value, which 
has a uniform distribution under the null hypothesis (nothing effected), we would 
expect to see a difference called in 5% of cases at a P-value of 0.05. If we test one 
locus, which has on average 11 neighbours, we would expect to identify 0.55 hits 
by chance (11 X 0.05 = 0.55). However, if we now test 12 loci we would expect to 
see 6.6 (12 X 0.55) knockdowns that appear to have an effect under the null 
hypothesis. 
Luciferase analysis of Nanog ES lines. ES cells containing a Nanog-luciferase 
construct*' were infected in biological duplicate and monitored after 7 days. 
Luciferase activity was measured using Bright-Glo (Promega). All reagents and 
cells were equilibrated to room temperature. 100 ul Bright-Glo solution was 
added to each plate well. Plates were incubated in the dark at room temperature 
for 10 min and luciferase was measured on a plate reader. The luciferase units 
were normalized to the control hairpins and a Z-score compared to the negative 
controls (excluding luciferase hairpins) was computed. For each hairpin, we 
computed a Z-score relative to the negative control hairpins and identified hits 
reducing luciferase levels more than 6 standard deviations (Z< —6) for both 
independent replicates. In all cases we were able to identify a significant reduction 
in luciferase levels when using distinct hairpins targeting luciferase. To exclude 
hits that were due to an overall reduction in proliferation (which would also cause 
a reduction of Nanog positive cells in this read-out) we excluded all hairpins that 
caused a reduction in proliferation as measured by AlamarBlue incorporation 
(described below). AlamarBlue incorporation was measured in the same cells 
immediately before reading out Nanog-luciferase levels. 
AlamarBlue analysis of ES lines. After a 7-day infection, Nanog-luciferase cell 
viability was measured using AlamarBlue (Invitrogen; DAL1025). AlamarBlue was 
mixed with mES media in a 1:10 ratio, added to the cells and incubated at 37 °C for 
1h. Absorbance readings at 570 nm were taken. To control for possible effects due 
to virus titre, we measured AlamarBlue incorporation on both puromycin treated 
and non-puromycin treated samples for each infection. 
mRNA analysis of pluripotency markers. V6.5 ES cells were infected with 
shRNAs targeting lincRNAs, protein-coding genes, and 21 negative controls. 
After 8 days, RNA was extracted and mRNA levels of the Oct4, Nanog, Sox2, 
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Klf4 and Zfp42 pluripotency markers were analysed using qPCR. Primer 
sequences are listed in Supplementary Table 9. Each sample was normalized to 
Gapdh levels. Significance was assessed compared to the negative control hairpins 
using a one-tailed t-test. 

To control for off-target effects, we analysed additional hairpins against the 26 
lincRNAs affecting Nanog-luciferase levels. Of the 26 lincRNAs, we identified 15 
lincRNAs that contained an additional hairpin that reduced lincRNA expression 
by >50%. V6.5 ES cells were infected with the best and additional hairpin across 
biological replicates for these 15 lincRNAs and 21 negative control hairpins. RNA 
was extracted after 8days and Oct4 expression levels were determined using 
qPCR. Significance was assessed relative to the negative controls using a one- 
tailed t-test. 

Immunofluorescence. We crosslinked cells in 4% paraformaldehyde for 15 min, 
and washed in 1X PBS three times. To permeabilize the cells, we washed with 1x 
PBS +0.1% Triton and then blocked in 1X PBS + 0.1% Triton + 1% BSA for 
45 min at room temperature. We incubated cells with anti-Pou5f1 antibody (Santa 
Cruz: SC-9081) at 1:100 dilution in blocking solution for 1.5 h at room temperature 
and then washed in blocking solution three times. Next, we incubated cells in anti- 
rabbit secondary antibody coupled to GFP (Jackson ImmunoResearch: 111-486- 
152) at a dilution of 1:1,000 in blocking solution for 45 min. Finally, we thoroughly 
washed cells in blocking solution three times, and added vectashield containing 
DAPI (VWR: 101098-044) to each well. 

Public data set curation. Traditionally, lineage markers are used to identify 
changes in phenotypic states. Although these markers can be good indicators 
of differentiation potential, there are two major limitations with this approach. 
First, there are multiple genes that are associated with each lineage so simply 
looking at one can often be misleading. Second, this approach only works for 
classifying states with well-characterized marker genes but would not work for a 
comprehensive characterization of the function in the cell. Therefore, we decided 
to take a different approach and look at the entire gene expression profile of each 
lincRNA knockdown to determine what cell state each lincRNA resembles. 

We curated a set of ES perturbations and differentiation states from publicly 
available sources. Specifically, we used the NCBI e-utils (http://eutils.ncbi.nlm. 
nih.gov/) to programmatically identify all published data sets containing keywords 
associated with embryonic stem cells. We filtered the list to only include mouse 
data sets that were generated across one of three commercial array platforms 
(Affymetrix, Agilent and Illumina). Following this approach, we manually curated 
the list to include data sets associated with ES cell perturbations (genetic deletions, 
RNAi, or chemical perturbations) and differentiation or induced differentiation 
profiles. This curation yielded 41 GEO data sets corresponding to >150 samples. 

Specifically, we defined differentiation lineage states using the following data 
sets. (1) Neuroectoderm: we downloaded a data set (GSE12982) corresponding to 
mouse ES cells containing a Sox1-GFP reporter construct. Upon differentiation 
of Sox1-GFP ES cells into embryoid bodies (EBs), Sox1-GFP-positive cells were 
collected and their global expression was profiled”. In addition, we downloaded a 
data set (GSE4082)° corresponding to direct neuroectoderm differentiation”. 

(2) Mesoderm: we downloaded the same data set (GSE12982) as above, where 
the authors differentiated brachyury-GFP reporter ES cells into EBs and sorted 
and profiled brachyury-GFP-positive cells”. 

(3) Endoderm: we downloaded a data set (GSE11523) corresponding to mouse 
ES cells which were engineered to overexpress GATA6™. GATA6 overexpression 
has been shown to drive ES cells into a primitive endoderm-like state. 

(4) Ectoderm: we downloaded a data set (GSE4082)° corresponding to mouse 
ES cells differentiated into primitive ectoderm-like cells with defined media*’. 

(5) Trophectoderm: we downloaded a data set (GSE11523)** corresponding to 
mouse ES cells which were engineered to deplete Oct4*’. These cells have been shown 
to enter a trophectoderm-like state*’. To ensure specificity to the trophectoderm 
state, we also compared the expression effects to trophoblast stem cells*’. For all 
lincRNAs identified, we required a significant enrichment for both induced Oct4 
knockout and trophoblast stem cell programs. 

In addition, for all lineage states we used a curated discrete gene expression 
signature of differentiation which was previously functionally tested and shown 
to correspond specifically to differentiation into the associated states®’. 
Continuous enrichment analysis and phenotype-projection analysis. To deter- 
mine relationships between lincRNA knockdowns and functional states, we used 
a modified Gene Set Enrichment Analysis approach that accounts for the con- 
tinuous nature of the two data sets, similar to previously described exten- 
sions****°°, For each lincRNA knockdown by functional pair we compute a 
continuous enrichment score. Specifically, (1) for each lincRNA knockdown 
we compute a normalized score matrix compared to a panel of negative control 
hairpins by computing a t-statistic for each gene between the replicate lincRNA 
knockdown expression values and the control knockdown values. (2) For each 
experiment, we sort the matrix by the normalized score such that the most 
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differentially expressed upregulated gene is first and the most differentially 
expressed downregulated gene is last. Using this ordering we sort the functional 
data set such that the ordering corresponds to the differential rank of the lincRNA 
knockdown set. (3) We compute a score S; as the running average of values from 
the first position to position i. We then define the enrichment score E as the 
maximum of the absolute value of §; for all values of i> 10. We require i > 10 to 
avoid small fluctuations in the beginning of the ranked list causing fluctuations in 
the enrichment score. This score is computed for each lincRNA knockdown by 
functional set. Because we have many lincRNA knockdowns and functional sets, 
in reality we have a matrix of scores and we will refer to the enrichment score of 
the ith knockdown and jth functional set as Ej. 

To assess the significance of these scores, we compute a permutation-derived 
FDR and assign a confidence value for each projection. Specifically, to assess the 
significance of E;, we permute the lincRNA knockdown samples and control 
samples and compute the enrichment score for each pair across all permutations. 
To account for the FDR associated with many lincRNAs and functional sets, we 
use the values of all permutations directly to assess the FDR level of Ej. 
Specifically, to assess the FDR for each enrichment value Ej, we accumulate all 
the permutation values for all lincRNA knockdowns and functional sets and 
compute the number of values greater than E; as well as a vector of values greater 
than E;; corresponding to each permutation. The FDR is computed as the average 
number of permuted values greater than Ej divided by the observed number 
greater than E,;. Using this approach, we assign an FDR value to each lincRNA 
knockdown by functional set and identify significant hits as those with an FDR 
<0.01. 

To highlight the accuracy of this approach, we observed that for publicly avail- 
able gene perturbations for which we also perturbed the gene we were able to 
identify a significant association of target genes in ~75% of cases. Although the 
remaining few did not pass our conservative significance criteria, they also showed 
increased enrichments consistent with their common effects. In addition, the 
projected effects are highly reproducible across distinct experiments originating 
from many groups and across multiple expression platforms. Highlighting the 
specificity of this approach, we note that there are many profiles for which no 
lincRNA had a similar effect. 

Analysis of gene-expression overlaps between independent hairpin knock- 
downs. To determine whether independent hairpins targeting the same 
lincRNA gene share common gene targets, we computed a continuous enrichment 
score described above. Briefly, we computed a t-statistic for both hairpins against 
the negative controls. We then took the second best hairpin and sorted the genes. 
We scored the best hairpin affected genes based on this ranked order. We assessed 
the significance of this enrichment by permuting the samples and controls and 
assigned an FDR of the overlap of the expression effect (as described above). 
Discrete gene set analysis. Discrete gene sets were analysed using the Gene Set 
Enrichment Analysis with a slight modification to the scoring procedure to be 
more analogous to our continuous scoring procedure (described above). 
Specifically, we computed the average of the expression changes (defined by 
the t-statistic) for all genes within the discrete gene set upon knockdown”. 
Significance was assessed by permuting the control and sample labels and re- 
computing the average statistic for each permutation. The FDR was assessed off of 
these values as described above. 

Lineage marker gene analysis. We curated lineage marker gene sets from pub- 
lished work and publicly available sources’”***’. We identified lineage marker 
genes as significantly upregulated using the differential expression criteria outlined 
above. We validated the expression of these lineage marker genes for a selected set 
of lineage marker genes using qPCR (as described above) after a 4-day infection. 
Specifically, we looked at the expression of Fgf5 (ectoderm), Sox1 (neuroecto- 
derm), Sox17 (endoderm), brachyury (mesoderm) and Cdx2 (trophectoderm). 
Primer sequences are listed in Supplementary Table 9. Expression estimates were 
normalized to Gapdh and compared to a panel of 25 negative control hairpins. 
Identifying bound lincRNA promoters. We obtained genome-wide transcrip- 
tion factor binding data in mouse ES cells from two sources. The transcription 
factors Oct4, Sox2, Nanog and Tcf3 were downloaded from the Gene Expression 
Omnibus (GSE11724) and c-Myc, n-Myc, Zfx, Stat3, Smad1, Klf4 and Esrrb from 
GEO (GSE11431). For each ChIP-Seq data set, the raw reads were obtained from 
the SRA (http://www.ncbi.nlm.nih.gov/sra) and processed as follows. (1) The 
reads were all aligned to the mouse genome assembly (build MM9) using the 
Bowtie aligner®, requiring a single best placement of each read. All reads with 
multiple acceptable placements were removed from the analysis. (2) Binding sites 
were determined from the aligned reads using the MACS” (http://liulab.dfci. 
harvard.edu/MACS/) algorithm using the default parameters with -mfold 8 to 
account for varying read counts in the libraries. (3) lincRNA promoter regions were 
defined as previously described** using the location of the K4me3 peaks overlap- 
ping or within 5kb of the transcriptional start site determined by RNA-Seq 
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reconstruction. (4) The transcription factor binding locations and lincRNA pro- 
moter locations were intersected and the enrichment level of the peak overlapping 
a lincRNA promoter was assigned transcription factor binding enrichment for 
each lincRNA. We defined transcription factor binding locations for protein- 
coding genes in a comparable way. (5) To exclude the possibility that some of 
this binding might be due to transcription factor binding at distal enhancers, we 
excluded all binding events that showed evidence of P300—a protein associated 
with active enhancers*—localization. Altogether, we only identified ~5% of 
promoters overlapping with any P300 enrichment signal, a slightly lower percent- 
age than identified for protein-coding gene promoters with detectable P300 signal. 
Identifying transcription-factor-regulated lincRNA genes. lincRNA probes on 
the Agilent microarray were analysed using the differential expression methodology 
described above after knockdown of the transcription factor and comparison to the 
negative control hairpins. To confirm the expression changes of these lincRNAs, we 
hybridized 12 transcription factor knockdowns ona custom lincRNA codeset using 
the Nanostring nCounter assay’ (LIN-MES1-96). The knockdowns were profiled 
in biological duplicate along with 15 negative controls. Regulated lincRNAs were 
identified using the differential expression approach described above. 
Nanostring probe-set design. Nanostring probes against lincRNA genes were 
designed following the standard nanostring design principles with the following 
modifications specifically for the lincRNA probes. (1) To exclude possible cross- 
hybridization, probes were screened for cross-hybridization against both the 
standard mouse transcriptome as well as a background database constructed 
from all the lincRNA sequences. (2) To account for isoform coverage, a first pass 
design attempted to select a probe that would target as many isoforms as possible 
for each lincRNA. In cases where it was not possible to target all isoforms for a 
given lincRNA, the probe that targeted the largest number was selected, and 
additional probes were chosen when possible to target the remaining isoforms. 
(3) The standard restrictions on melting temperature and sequence composition 
were relaxed to include probes for as many lincRNAs as possible. 

Retinoic acid differentiation. V6.5 cells were cultured on gelatin-coated dishes 
in mES media in the absence of LIF. 5 1M of retinoic acid was added daily and cell 
samples were taken daily for 6 days. RNA was extracted using Qiagen’s RNeasy 
spin columns following the manufacturer’s protocol. 

Western blots. 30 jig of mESC nuclear protein extracts were run on 10% Bis-Tris 
gels (Invitrogen NP0316BOX) in MOPS buffer (Invitrogen NP0001) at 75 V for 
20 min followed by 120 V for 1 h. Gels were incubated for 30 min in 20% methanol 
transfer buffer (Invitrogen NP0006-1) and transferred onto PVDF membranes 
(Invitrogen 831605) at 20 V for 1h using the Bio-Rad semi-dry transfer system 
(170-3940). Membranes were blocked in Blotto (Pierce, 37530) at room temper- 
ature for 1h. Antibodies were diluted in Blotto and membranes were incubated 
overnight at 4°C. Antibodies were diluted in the following concentrations. Ezh2 
1:2,000, Suz12 1:5,000, hnRNPH 1:1,000, Ruvbl2 1:1,000, Jaridlb 1:500, Hdacl 
1:250, Cbx6 1:500, Yy1 1:500. All antibodies tested were raised in rabbit. The next 
day, membranes were washed 3X in 0.1% TBST for 5 min each. The membranes 
were probed with anti-rabbit-horse radish peroxidase (GE Healthcare; NA9340V) 
at a 1:10,000 dilution, washed 3X in 0.1% TBST, incubated in ECL reagent (GE 
Healthcare RPN2132) and exposed. 

Crosslinked RNA immunoprecipitation. V6.5 mES cells were fixed with 1% 
formaldehyde for 10min at room temperature, quenched with 2.5M glycine, 
washed with 1X PBS (3X) harvested by scraping, pelleting, and re-suspended in 
modified RIPA lysis buffer (150 mM NaCl, 50 mM Tris, 0.5% sodium deoxycho- 
late, 0.2% SDS, 1% NP-40) supplemented with RNase inhibitors (Ambion, 
AM2694) and protease inhibitors. For UV crosslinking experiments, cells were 
irradiated with 254 nm UV light. Cells were kept on ice and crosslinked in 1 PBS 
using 400,000 pijoules cm” *. 

Cell suspension was sonicated using a Branson 250 Sonifier for 3 X 20s cycles at 

20% amplitude. 10 jl of Turbo DNase (Ambion, AM2238) was added to sonicated 
material, incubated at 37 °C for 10 min, and spun down at max speed for 10 min at 
4 °C. Protein-G beads were washed and pre-incubated with antibodies for 30 min 
at room temperature. Lysate and beads were incubated at 4 °C for 2 h. Beads were 
washed 3X using the following wash buffer (1x PBS, 0.1% SDS, 0.5% NP-40) 
followed by 2X using a high salt wash buffer (5X PBS, 0.1% SDS, 0.5% NP-40) and 
crosslinks were reversed and proteins were digested with 5 tl proteinase-K (NEB, 
P8102S) at 65 °C for 2-4h. RNA was purified using phenol/chloroform/isoamyl 
alcohol and RNA was precipitated in isopropanol. 
Nanostring hybridization. 500 ng of total RNA was hybridized for 17 h using the 
lincRNA code set. The hybridized material was loaded into the nCounter prep 
station followed by quantification on the nCounter Digital Analyser following the 
manufacturer’s protocol. For RNA immunoprecipitation experiments, we used a 
modified protocol. After reverse crosslinking, RNA was extracted using phenol/ 
chloroform and ethanol precipitation methods and re-suspended in 10 pl of HO. 
5 ul of the eluted material was hybridized for 17h using the lincRNA code set. 


Nanostring analysis. Probe values were normalized to negative control probes by 
dividing the value of the probe by the maximum negative control probe. Probe 
values were floored to a normalized value of 3 (threefold higher than maximum 
negative control). Probes with no value greater than this floor across all samples 
were removed from the analysis. The values were log transformed. To control for 
variability between runs and different input material amounts, we normalized all 
samples simultaneously using the quantile normalization approach described 
above. The result is a set of normalized log-expression values for each probe 
normalized across all experiments. 

Validation of RNA immunoprecipitation methods. To validate our formaldehyde- 
based RNA immunoprecipitation method we immunoprecipitated the RNA bind- 
ing protein hnRNPH, which has a role in mRNA splicing” and identified the 
associated RNAs. Consistent with known interactions, we identified a strong 
enrichment for its binding to intronic regions of mRNA genes. We validated these 
observed results in mouse ES cells by performing UV-crosslinking experiments”? 
and identified nearly identical results. We identified a similar correlation between 
the UV and formaldehyde crosslinked samples as for biological replicates of UV 
crosslinked samples and formaldehyde crosslinked samples and highly comparable 
enrichments (data not shown). 

Antibody selection. We selected chromatin proteins that have been implicated in 
regulation of the pluripotent state along with their known associated ‘reader’, 
‘writer’ and ‘eraser’ complexes. Specifically, we tested antibodies against 40 chro- 
matin proteins, corresponding to 28 chromatin complexes. In many cases, we 
tested multiple antibodies against the same target protein to try to identify an 
antibody that worked well for immunoprecipitation. A full list of tested com- 
plexes and their associated antibodies is listed in Supplementary Table 18. 
Determining significant chromatin-lincRNA enrichments. We tested each 
antibody using formaldehyde crosslinked cells and had a two-step procedure 
for considering an antibody successful. (1) We tested all selected antibodies in 
batches, with each batch containing a mock-IgG (Santa Cruz) negative control 
and hnRNPH (Bethyl) positive control. Batches with variability in either the 
mock-IgG or hnRNPH controls were excluded and retested. For each successful 
batch, we computed enrichment for each lincRNA between the tested antibody 
and mock-IgG. We considered an antibody successful in the first step if the 
highest enrichment level exceeded a fivefold change compared to the mock- 
IgG control and more than 10 lincRNAs exceeded this threshold. Although this 
approach can yield false positives (antibodies that pass but are not efficient) it 
significantly reduced the number of antibodies to be tested in the next step. (2) 
For all antibodies that successfully passed the first criterion, we performed immu- 
noprecipitation on two additional biological replicates along with 4 mock-IgG 
controls. We computed a f-statistic for each lincRNA compared to the controls 
and assessed the significance using a permutation test, by permuting the samples 
and IgG samples (as above). Hits were considered significant if they exceed a 
t-statistic cutoff of 2 (log scale) compared to the controls and had an FDR <0.2. 
Weallowed a slightly higher FDR cutoff because the number of permutations was 
far smaller yielding lower power to estimate the FDR. Only antibodies yielding 
significant lincRNAs were considered successful. In total, we identified 12 of the 
28 complexes (55 antibodies) with at least one successful antibody. 
Determining significant overlaps between lincRNA and chromatin protein 
knockdown effects. To determine the functional overlap between the lincRNA 
and the chromatin complexes it physically interacts with, we compared the effects 
on gene expression upon knockdown of the lincRNA and the associated protein 
complex. To do this, we used the gene expression profiles determined for each 
lincRNA knockdown and knockdowns of 9 of the 12 identified chromatin com- 
plexes for which we had good hairpins. We defined each interaction between a 
lincRNA and protein, and computed a continuous enrichment score, generated 
all permutations of the control hairpins and sample hairpins and assigned an FDR 
to the scores (as described above). At an FDR <0.05 we identified 43% of the 
interactions to be significant. For 69% of the interactions, we were able to identify 
an overlap at an FDR <0.1. 
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Mirror extreme BMI phenotypes associated with 
gene dosage at the chromosome 16p11.2 locus 
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Both obesity and being underweight have been associated with 
increased mortality'*. Underweight, defined as a body mass index 
(BMI) = 18.5kg perm” in adults and < —2 standard deviations 
from the mean in children, is the main sign of a series of hetero- 
geneous clinical conditions including failure to thrive’, feeding 
and eating disorder and/or anorexia nervosa®’. In contrast to 
obesity, few genetic variants underlying these clinical conditions 
have been reported*”. We previously showed that hemizygosity of a 
~600-kilobase (kb) region on the short arm of chromosome 16 
causes a highly penetrant form of obesity that is often associated 
with hyperphagia and intellectual disabilities’’. Here we show that 
the corresponding reciprocal duplication is associated with being 
underweight. We identified 138 duplication carriers (including 
132 novel cases and 108 unrelated carriers) from individuals 
clinically referred for developmental or intellectual disabilities 
(DD/ID) or psychiatric disorders, or recruited from population- 
based cohorts. These carriers show significantly reduced postnatal 
weight and BMI. Half of the boys younger than five years are 
underweight with a probable diagnosis of failure to thrive, whereas 
adult duplication carriers have an 8.3-fold increased risk of being 
clinically underweight. We observe a trend towards increased 
severity in males, as well as a depletion of male carriers among 
non-medically ascertained cases. These features are associated with 
an unusually high frequency of selective and restrictive eating 
behaviours and a significant reduction in head circumference. 
Each of the observed phenotypes is the converse of one reported 
in carriers of deletions at this locus. The phenotypes correlate with 
changes in transcript levels for genes mapping within the duplica- 
tion but not in flanking regions. The reciprocal impact of these 
16p11.2 copy-number variants indicates that severe obesity and 
being underweight could have mirror aetiologies, possibly through 
contrasting effects on energy balance. 

Copy-number variants (CNVs) at the 16p11.2 locus have been asso- 
ciated with cognitive disorders including autism (deletions) and schizo- 
phrenia (duplications)'*”, conditions that have been suggested to lie at 
opposite ends of a single spectrum of psychiatric phenotypes’*. We and 
others have reported that a deletion of this region spanning 28 genes 
(Supplementary Table 1) increases the risk of morbid obesity 43-fold 
(Supplementary Fig. 1)'°. We hypothesized that the reciprocal 
duplication, with its resulting increase in gene dosage, may influence 
BMI in a converse manner. The duplication was identified in 73 out of 
31,424 patients with DD/ID, a frequency consistent with previous 
reports’’ (Table 1). Four additional cases were identified among 1,080 
patients affected by bipolar disease or schizophrenia. Compared to its 
prevalence in seven European population-based genome-wide asso- 
ciation study (GWAS) cohorts'*"* (31 out of 58,635 individuals), the 
duplication was significantly more frequent in both the DD/ID cohorts 
(P = 4.23 X 10 '°; odds ratio = 4.4, 95% confidence interval = 2.9- 
6.9) and the psychiatric cohorts (P = 3.6 X 10 °; odds ratio = 7.0, 
95% confidence interval = 1.8-19.9) (Table 1), strengthening previous 
reports of similar associations'*’’. Our data do not support a two-hit 
model’? for the effects of 16p11.2 duplications or deletions (Supplemen- 
tary Text and Supplementary Table 2). 


We compared available data on weight, height and BMI for 106 
independent duplication carriers (including published cases) to data 
for reference populations matched for gender, age and geographical 
location (Table 2, Methods and Supplementary Tables 3 and 4). The 
duplication was strongly associated with lower weight (mean Z-score 
—0.56; P=4.4 xX 10 *) and lower BMI (mean Z-score —0.47; 
P=2.0 X10 *) (Table 2 and Supplementary Table 5). Birth para- 
meters (n = 48) were normal, indicating a postnatal effect. Adults 
carrying the duplication had a relative risk of being clinically under- 
weight (BMI <18.5) of 8.3 (95% confidence interval = 4.4-15.9, 
P=1.53 X10 '°) (see Methods). Concordantly, none of the 3,544 
patients in our obesity cohorts'®!” carried the duplication (Table 1). 

To investigate these associations further, we carried out separate 
analyses of carrier patients (DD/ID and psychiatric) and non-medically 
ascertained carriers (population-based cohorts, plus 11 transmitting 
parents and three other affected first-degree relatives for whom data 
were available) (Table 2). Each category had significantly lower weight 
and BMI, with similar effect sizes. However, the proportion of under- 
weight cases (BMI = —2:.d.) was higher in the first group than in the 
second group (17 out of 76 compared to 2 out of 40; P = 0.017). Note 
that the impact of the duplication on underweight status might be 
underestimated here owing to prescription of antipsychotic treatments 
that are often associated with weight gain’? (Supplementary Table 6). 

Having demonstrated an association of the duplication with being 
underweight, we investigated the implications of gender for the resulting 
phenotypes (Fig. 1, Supplementary Fig. 2 and Supplementary Table 7). 
In DD/ID patients, the impact of the duplication on being underweight 
is stronger in males; the effect in females is in the same direction, but is 
smaller and not statistically significant (Table 2). A similar and signifi- 
cant difference (P = 0.0168) was observed in adult carriers (all groups 
combined): the relative risk of being underweight for males is 23.2 
(95% confidence interval = 9.1-59.3, P= 4.6 X 10 *'); for females it 
is only 4.7 (95% confidence interval = 1.9-11.8, P=9.9 X 10“). A 
gender bias was also observed in the ascertainment of DD/ID duplica- 
tion carriers, in which we have an excess of males (51 males:33 females, 
P = 0.044). By contrast, carriers from the general population showed a 
strong overrepresentation of females (10 males:21 females, P = 0.035) 
(Supplementary Text). A similar bias was observed among transmit- 
ting parents (7 males:23 females, P = 5.53 x 10“). Thus, there is an 
overrepresentation of males in the medically ascertained group, and a 
depletion in the non-medically ascertained one. We suggest that males 
may be more likely than females to present severe phenotypes, and that 
this may account for the gender bias because severely affected males 
may be less likely to be recruited to adult population cohorts or to be 
reproductively successful. 

As previously reported”’, the duplication was also associated with 
reduced head circumference (mean Z-score —0.89; P= 7.8 X 10°) 
(Fig. 1), 26.7% presenting with microcephaly (head circumference = 
—2 s.d.), whereas carriers of the reciprocal deletion had an increased 
head circumference (mean Z-score +0.57; P= 1.79 X 10 °) (Sup- 
plementary Fig. 3 and Supplementary Table 8): an additional instance 
of a mirror phenotype associated with reciprocal changes in copy 
number at this locus. Notably, head circumference Z-scores correlate 
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Table 1 | 16p11.2 rearrangements in cases and controls 


Ascertainment Cohorts Duplication Deletion Total 
n Py n Py 

Neuro- Unspecified DD/ID* from 28 cytogenetic centres te 113 30,323 

developmental ADHD#, deCODE 0 1 591 

disorders Childhood autismt, deCODE (0) 2 159 
Childhood autism spectrum disordert, deCODE 1 3 351 
TOTAL 73 4.23 x 10-78 119 543x103? 31,424 
Rearrangement frequency (95% Cl) 0.23% (0.18-0.29) 0.38% (0.31-0.45) 

Family history First-degree relatives of probands 30 35 43/62 || 

Adult psychiatric Schizophrenia, deCODE 0 1 657 

symptoms Bipolar disease, Rouen 1 @) 56 
Schizophrenia, schizo-affective, Rouen 3 267 
TOTAL 4 3.57 x 10-3 1 3.78 x 107-7 1,080 
Rearrangement frequency (95% Cl) 0.37% (0.01-0.73) 0.09% (0-0.27) 

Underweight Eating disorder, Spain 18 e) 441 

Obesity Obesity, Spain 0 2 653 
Adult obesity, France (0) 4 705 
Childhood obesity, France & UK (0) 7 1,574 
Obesity bariatric surgery, France 0 2 41 
Obesity discordant siblings, Sweden 0 2 159 
Obesity and cognitive delay, France & UK 0 9 312 
TOTAL 0 4.21x 107 26 2.52 x 10779 3,544 
Rearrangement frequency (95% Cl) 0) 0.73% (0.45-1.01) 

Population-based NFBC1966 Finnish 4 3 5,319 

cohorts CoLaus Swiss 5 0) 5,612 
EGCUT Estonian 2 1 2,994 
deCODE Iceland 17 18 36,601 
SHIP Germany 1 2 4,070 
KORA F3+F4 Germany 2 1 3,458 
Paediatric family study 0 6) 581 
TOTAL 31 25 58,635 


Rearrangement frequency (95% Cl) 


0.05% (0.03-0.07) 


0.04% (0.03-0.06) 


Cl, Confidence interval; ADHD, attention-deficit hyperactivity disorder. *Not a disease-specific cohort. Detailed distribution is provided in the online methods. }Fisher’s exact test, compared to the combined 
frequency in general population groups. {There was no overlap between these 3 cohorts. 8Atypical duplication (see Supplementary Fig. 5). || Total number of parental pairs tested for duplication/deletion. 13 out of 


43 duplications and 27 out of 62 deletion cases were de novo. 


positively with those of BMI in carriers of both the duplication 
(tho = 0.37; P=2.65X10°*) and the deletion (rho =0.42; 
P=19X10 °) (Supplementary Methods). This indicates that head 
circumference and BMI may be regulated by a common pathway, or 
that a causal relationship exists between these two traits in these 
patients. Alternatively, the two phenotypes may arise from distinct 
genes and pathways. A full list of malformations and secondary phe- 
notypes reported in duplication carriers ascertained for DD/ID is 
available in Supplementary Table 9. 

In view of the importance of modified eating behaviours in obesity 
and being underweight, the clinical reports of duplication carriers were 
screened for evidence of such modified behaviours. In 11 out of 77 
clinically ascertained cases, clinicians had spontaneously reported 
low food intake and selective and restrictive eating behaviour, again 
mirroring one of the phenotypes—hyperphagia—seen in deletion 
carriers’? (Supplementary Table 6) and indicating that the duplication 
may increase the risk of eating disorders. Consequently, we carried out 


multiplex ligation-dependent probe amplification (MLPA, Supplemen- 
tary Table 10) to screen for 16p11.2 rearrangements in 441 patients 
diagnosed with eating disorders, including anorexia nervosa, bulimia and 
binge eating disorder (Table 1 and Supplementary Text). No duplications 
of the entire region were identified, but one out of 109 anorexia nervosa 
patients carried an atypical 136-kb duplication that encompasses the 
sialophorin (SPN) and quinolinate phosphoribosyltransferase (QPRT) 
genes (Supplementary Fig. 4). This single, smaller duplication does not 
allow us to draw any firm conclusions, but together with other atypical 
rearrangements, it may, in the future, be essential for establishing the 
roles of the 28 genes within the region. 

Large genomic structural variants are known to affect the expression 
of genes not only within the affected region but also at a distance”. 
Therefore, it is possible that the phenotypes observed in 16p11.2 dele- 
tion and duplication individuals are due to effects on the expression of 
genes mapping outside the rearranged interval, rather than to gene 
dosage within the 600-kb deletion or duplication. We measured 


Table 2 | Comparisons of the height, weight and BMI distributions in duplication carriers and controls. 


Combined+ DD/ID or psychiatrict Non-medically ascertainedt 

Strata Mean Z-score P-value n* Mean Z-score P-value n* Mean Z-score P-value n* 

BMI All -0.47 2.0 x 10-3 102 —0.56 4.1x10°3 76 -0.45 6.0 x 10-3 40 

ale —0.54 2.1x 10-2 52 —0.71 1.3 x 107? 43 —0.31 2.0 x10} 14 

Female -0.4 1.8 x 107? 50 -0.37 83x10? 33 —0.52 4.2 x 10-3 26 

Weight All —0.56 4.4 x 10-4 104 —0.65 1.3 x 10-3 78 -0.61 3.0 x 107% 40 

ale —0.64 5.8 x 1073 53 -0.79 4.4 x 10-3 44 -0.57 88x10? 14 

Female -0.47 1.7 x 10°? 51 -0.47 6.5 x 10°? 34 —0.63 8.6 x 10-3 26 

Height All —0.24 4.8 x 10-2 103 —0.33 3.6 x 10-7 TT —0.15 1.8x107! 40 

ale —0.34 4.5 x 10-2 52 —0.4 4.6 x 10-2 43 —0.29 1.2x107! 14 

Female —0.14 2.6 x10 51 —0.24 2.1 x1071 34 —0.07 3.7 x10! 26 
The available BMI, weight and height data for duplication carriers were transformed to Z-scores using gender- and age-matched reference populations, and one-tailed t-tests were carried out to determine whether 
the mean Z-scores deviated from zero. Significant differences were identified by reference to cutoffs controlling the false discovery rate at 5% (see Methods): BMI, 0.022; weight, 0.032; height, 0.025. Significant 
results are indicated in bold. Data were not available for all subjects. *Relatives of probands were excluded as required, to avoid including more than one member of the same family in a single analysis. ¢Including 


24 cases from the literature (Supplementary Table 3). Population-based cases and first-degree relatives of probands. 
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Female cases 
P=0.049 P=0.015 


Male cases 
P=0.028 P=0559 P=0.377 P=0.016 


P=0.306 P=0.441 


BMI (Z-score) 


5-10 10-18 
Age (years) 


5-10 10-18 18-90 


Age (years) 


P=0.021 P=0.003 P=0.308 P=0.020 P=0.002 P=0.157 P=0.531 P=0.008 


DS 


° 


=2 


Head circumference (Z-score) 


5-10 10-18 
Age (years) 


18-90 


5-10 
Age (years) 


10-18 


Figure 1 | Effect of the chromosome 16p11.2 duplication on BMI and head 
circumference. Z-score values of BMI and head circumference in carriers of 
the 16p11.2 duplication, stratified by gender and age group. The most severe 
effect is observed in children at 0-5 years of age. Boxplots represent the fifth, 
twenty-fifth, median, seventy-fifth and ninety-fifth percentile for each age 
group. Light grey and dark grey backgrounds represent = —2 and = —3 s.d., 
respectively, corresponding to the WHO definition of moderately and severely 
underweight”. BMI is decreased in adolescent and adult females. 


relative transcript levels of 27 genes mapping within or near to the 
rearrangement, using lymphoblastoid cell lines (Supplementary 
Tables 1 and 11): six from deletion carriers, five from duplication 
carriers and ten from gender- and age-matched controls (Supplemen- 
tary Table 12). Expression levels correlated positively with gene dosage 
for all genes in the CNV region (Fig. 2), consistent with published 
partial results from adipose tissue’®. Mean relative transcript levels 
in deletion and duplication carriers were, respectively, 67% and 
214% of the levels measured in controls (Supplementary Table 13). 
Although genes proximal (centromeric) to the rearrangement interval 
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showed no significant variation in relative transcript levels between 
patients and controls (Fig. 2), distal (telomeric) genes showed a 
marked alteration in relative expression. However, their expression 
levels, including that of SH2B1 (for which gene dosage and a nearby 
single nucleotide polymorphism (SNP) have been associated with 
obesity’**°), were similarly upregulated in cell lines of both deletion 
and duplication carriers, showing no apparent correlation between 
transcript level and either copy number or phenotype (Fig. 2). 
Although lymphoblastoid cells may not recapitulate obesity-relevant 
tissues, previous experiments have shown a high degree of correlation 
between expression levels in different tissues and cell lines”, indicating 
that the same pathways may be similarly disrupted in different cell 
lineages. Thus, any involvement of these distal genes in the control of 
BMI in these subjects seems unlikely. 

Our study demonstrates the power of very large screens (>95,000 
samples: to our knowledge the largest of its kind so far) to characterize 
the clinical and molecular correlates of a rare functional genomic vari- 
ant. We demonstrate unambiguously that carrying the 16p11.2 duplica- 
tion confers a high risk of being clinically underweight, and show that 
reciprocal changes in gene dosage at this locus result in several mirror 
phenotypes. As in the schizophrenia/autism'* and microcephaly/ 
macrocephaly”' dualisms, abnormal eating behaviours, such as hyper- 
phagia and anorexia, could represent opposite pathological manifesta- 
tions of a common energy-balance mechanism, although the precise 
relationships between these mirror phenotypes remain to be deter- 
mined. We speculate that head circumference (which correlates with 
brain volume”’), and thus neuronal circuitry, may affect cognitive func- 
tion and energy balance in patients with 16p11.2 rearrangements 
(possibly through eating behaviour). Consistent with this are previous 
reports that a subgroup of children with microcephaly show a con- 
comitant reduction in weight percentile’. Our findings also support 
the observation that severe overweight and underweight phenotypes 
correlate with lower cognitive functioning*’’. Thus, abnormal food 
intake may be a direct result of particular neurodevelopmental di- 
sorders. Although it is possible that the 16p11.2 region encodes distinct 
genes specific for each trait, a more parsimonious hypothesis is that 
these clinical manifestations of dysfunction of the central nervous sys- 
tem are all secondary to the disruption of a single neurodevelopmental 
step that is sensitive to gene dosage. Further resolution of this issue may 
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Figure 2 | Transcript levels for genes within and near to the 16p11.2 
rearrangements. a, Relative expression levels of 27 genes mapping to 16p11.2 
in deletion (m = 6) and duplication (n = 5) carriers (red and green, 
respectively), and in control cell lines (n = 10, blue). Grey lines denote the 
extent of the 16p11.2 CNV (29.5-30.1 megabases (Mb)). Complete lists of 
genes mapping within the rearranged interval, and of the quantitative PCR 
assays, are in Supplementary Tables 1 and 11, respectively. For the possible 
relevance of each of these genes to obesity/leanness and/or developmental 
delay/cognitive deficits, see ref. 10. b, Rank comparison (Kruskal-Wallis test) 


between the expression of 27 genes mapping to 16p11.2 in deletion and 
duplication carriers (red and green, respectively) and in control cell lines (blue). 
Genes are labelled as telomeric, centromeric or within the rearranged interval 
(CNV). Dots correspond to the mean group rank and bars indicate the 
comparison interval. Groups with non-overlapping intervals are significantly 
different (P-values were adjusted for multiple testing issues using a Bonferroni 
correction, where the number of tests is the number of pairwise comparisons; 
the resulting adjusted P-value was less than 0.05). 
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require the identification of additional patients with rare atypical re- 
arrangements in this region. 


METHODS SUMMARY 


Underweight is defined in adults as BMI= 18.5. In individuals younger than 
18 years of age, it is defined as a Z-score = —2. 

Statistics. Two-tailed Fisher’s exact test was used to compare frequencies of the 
rearrangement in patients and controls. Z-scores were computed for all data using 
gender-, age- and geographically-matched reference populations. One-tailed 
Student's t-test was performed to test BMI, height, weight and head circumference 
in duplication carriers for Z-scores of less than zero. We used Kruskal-Wallis tests 
for differences in gene expression patterns. P-values were adjusted using a 
Bonferroni correction, considering the number of pairwise comparisons; the result- 
ing adjusted P-value was less than 0.05. The relative risk of being underweight was 
calculated as the ratio of the fraction of underweight individuals among duplication 
carriers versus our control group. 

Discovery of CNVs. Carriers of 16p11.2 duplication and deletion were identified 
through various procedures: (1) comparative genomic hybridization with Agilent 
44K, 60K, 105K, 180K, 244K arrays; (2) Illumina Human317, Human370, 
HumanHap550, Human610 and 1M BeadChips; (3) Affymetrix 6.0, 500K geno- 
typing arrays; (4) quantitative multiplex PCR of short fluorescent fragments 
(QMPSF); (5) fluorescent in situ hybridization (FISH); (6) MLPA. CNV analyses 
of GWAS data were carried out using cnvHap, a moving-window average-intensity 
procedure, a Gaussian mixture model, circular binary segmentation, QuantiSNP, 
PennCNV, BeadStudio GT module and Birdseed. At least two independent algo- 
rithms were used for each cohort. 

Expression analyses. Lymphoblastoid cell lines were established from carriers and 
controls. SYBR Green quantitative PCR was performed to assess relative expres- 
sion of genes. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Study cohorts. For the description of these cohorts, refer to Supplementary 
Information. 

CNV detection. Cases ascertained for intellectual disabilities and developmental 
delay were identified through standard medical diagnostic procedures. CNV ana- 
lyses of GWAS data were variously carried out using cnvHap”*'; a moving-window 
average-intensity procedure; a Gaussian mixture model (Valsesia et al., submitted); 
circular binary segmentation’; QuantiSNP**; PennCNV~; BeadStudio GT module 
(Illumina Inc.); and Birdseed*® (see below). At least two independent algorithms were 
used for each cohort. 

Patients referred for intellectual disabilities and developmental delay. All 
diagnostic procedures (CGH, quantitative PCR and/or quantitative multiplex 
PCR of short fluorescent fragments) were carried out according to the relevant 
guidelines of good clinical laboratory practice for the respective countries. All 
rearrangements in probands were confirmed by a second independent method 
and karyotyping was performed in all cases to exclude a complex rearrangement. 
Northern Finland 1966 birth cohort (NFBC). CNV calling has been previously 
described'®. In brief, data were normalized using Illumina BeadStudio, then GC 
effects on ratios were removed by regressing on GC and GC2, and wave effects 
were removed by fitting a Loess function*’. CNV analysis was done using 
cnvHap”'. All called 16p11.2 duplications were validated by direct analysis of 
log, ratios. Data for each probe were normalized by first subtracting the median 
value across all samples (so that the distribution of ratios for each probe was 
centred on zero), and then dividing by the variance across all samples (to correct 
for variation in the sensitivity of different probes to copy-number variation). All 
CNV calls were confirmed by MLPA. 

deCODE genetics. [lumina Human317, Human370, HumanHap550, Human610 
and 1M BeadChips were used for CNV analysis. BeadStudio (version 2.0) was used 
to call genotypes, normalize the signal intensity data and establish the log R ratio 
(LRR) and B allele frequency (BAF) at every SNP according to standard Illumina 
protocols. All samples passed a standard SNP-based quality control procedure 
with a SNP call rate greater than 0.97. PennCNV”’, a free, open-source tool, was 
used for detection of CNVs. The input data for PennCNV are LRR, a normalized 
measure of the total signal intensity for the two alleles of the SNP, and BAF, a 
normalized measure of the allelic intensity ratio of the two alleles. These values are 
derived with the help of control genotype clusters (HapMap samples), using the 
Illumina BeadStudio software. PennCNV employs a hidden Markov model to 
analyse the LRR and BAF values across the genome. CNV calls are made on the 
basis of the probability of a given copy state at the current marker, as well as on 
the probability of observing a copy-state change from the previous marker to the 
current one. PennCNV uses a built-in correction model for GC content”*. 
Cohorte Lausannoise (CoLaus). Data normalization and CNV calling have been 
previously described'®. Data normalization included allelic cross-talk calibration*”°, 
intensity summarization using robust median average, and correction for any PCR 
amplification bias. Wave effects were corrected by fitting a Loess function*”. CNV 
calling was done using a Gaussian mixture model (Valsesia et al., submitted) that fits 
four components (deletion, copy-neutral, one additional copy and two additional 
copies) to copy-number ratios. The final copy number at each probe location is 
determined as the expected (dosage) copy number. The method has been validated 
by comparing test data sets with results from the CNAT" and CBS**” algorithms, 
and by replicating a subset of CoLaus subjects on Illumina arrays. Only duplications 
found by both Gaussian mixture model and CBS were considered. 

Estonian genome center of the University of Tartu (EGCUT). Genotypes were 
called by BeadStudio software GT module v3.1 or GenomeStudio GT v1.6 
(Illumina Inc.). Values for LRR and BAF produced by BeadStudio were formatted 
for further CNV analysis and break-point mapping with Hidden-Markov-Model- 
based softwares QuantiSNP (ver.1.1)** and PennCNV™ or CNVPartition 2.4.4 
(Illumina Inc.). All analyses were carried out using the recommended settings, 
except changing EMiters to 25 and L to 1,000,000 in QuantiSNP. For PennCNV, 
the Estonian-population-specific SNP allele frequency data was used. All detected 
duplications were confirmed by quantitative PCR. 

Study of health in Pomerania (SHIP). Raw intensities were normalized using 
Affymetrix power tools (Affymetrix); CNV analysis was done using Birdseye from 
the Birdsuite software package*® and PennCNV*. PennCNV predictions with 
confidence scores less than 10 were removed. Birdsuite predictions were filtered 
as in ref. 15: CNVs were kept if their linkage disequilibrium (LOD) score was > 10, 
length >1kb, number of probes =5 and size per number of probes <10,000. 
Kooperative Gesundheitsforschung in der Region Augsburg (KORA) F3 and 
F4. Genotyping for KORA F3 was performed using the Affymetrix 500K array set, 
consisting of two chips (Styl and NspI). The KORA F4 samples were genotyped 
with the Affymetrix human SNP array 6.0. For both studies, genomic DNA from 
blood samples was used for analysis. Hybridization of genomic DNA was done in 
accordance with the manufacturer’s standard recommendations. Genotyping was 


LETTER 


done in the Genome Analysis Centre of the Helmholtz Centre Munich. Genotypes 
were determined using BRLMM clustering algorithm (Affymetrix 500K array set) 
and Birdseed2 clustering algorithm (Affymetrix array 6.0). For quality control 
purposes, we applied a positive control and a negative control DNA every 48 sam- 
ples (KORA F3) or 96 samples (KORA F4). On the chip level, only subjects with 
overall genotyping efficiencies of at least 93% were included. In addition, the called 
gender had to agree with the gender in the KORA study database. After exclusions, 
1,644 individuals remained in KORA F3 and 1,814 in KORA F4 for further 
analysis. 

MLPA analysis. We used MLPA to determine changes in the copy number of a 
region of about 2 Mb on chromosome 16p11.2. Briefly, we designed, using hg18, 
nine probes within the targeted region, one control probe outside the rearranged 
region and seven control probes targeting unique position in the genome 
(Supplementary Table 10). Assays were performed with MRC-Holland reagents 
according to the manufacturer’s protocol’. The analysis of the amplification 
products was performed by capillary electrophoresis in the DNA Analyser 
3730XL and using the GeneMapper software v3.7 (Applied Biosystems). The 
calculations were performed independently for each experiment: we first normal- 
ized the MLPA data to minimize the amount of experimental variation, summing 
all signal values of each control probe for each sample, and then dividing each 
signal value of each sample by the sum. The normalized signal values were com- 
pared to signal values from all other samples in the same experiment, dividing the 
normalized signal values by the average calculated from all the samples in the same 
experiment. The product of this calculation is termed dosage quotient (DQ). A DQ 
value of less than 0.65 or more than 1.25 was considered as copy-number loss or 
gain, respectively, as previously described**”. 

Custom array-CGH for the short arm of chromosome 16. DNA samples were 
labelled with Cy3 and cohybridized to custom-made Nimblegen arrays with Cy5- 
labelled DNA from the CEPH cell line GM12042. These arrays contained 71,000 
probes spread across the short arm of chromosome 16 from 22.0 Mb to 32.7 Mb (at 
a median space of 45 bp between 27.5 Mb and 31.0 Mb), and 1,000 control probes 
situated in invariable regions of the X chromosome. DNA labelling, hybridization 
and washing were performed according to Nimblegen protocols. Scanning was 
performed using an Agilent G2565BA microarray scanner. Image processing, 
quality control and data extraction were performed using the Nimblescan software 
v.2.5. 

Defining underweight. Underweight was defined throughout the study as 
BMI = 18.5 kg perm’ in adults and < —2 s.d. in children*™“”“*. 

Weight, height, BMI and head circumference Z-scores as a function of age. For 
paediatric cases, weight, height, BMI and head circumference Z-scores were deter- 
mined for paediatric cases (0-18 years of age) using clinical growth charts specific 
to the country of origin. Children were ascertained from nine different countries. If 
charts were only available in percentiles, those measures were transformed into 
Z-scores using gender-, age- and geographically-matched reference populations 
(see Statistics). 

For the USA and Canada, data from the Center for Disease Control and 
National Center for Health Statistics (CDC/NCHS) were used to calculate 
Z-scores””. 

For the French paediatric population, we used French national growth 
charts*’*'. For the Swiss paediatric population, we used Swiss national growth 
charts”. For Dutch participants, Dutch national growth charts were used”. For 
Italian, German, Finnish and Austrian cases (n = 6), height, weight and BMI 
Z-scores were estimated using WHO growth charts™. 

To check for discrepancies generated by the use of different growth charts, 
height, weight and BMI Z-scores were recalculated using WHO growth charts 
for all cases under five years of age, regardless of origin (http://www.who.int/ 
childgrowth/standards/en/54). Z-scores obtained using the WHO data were not 
significantly different. These growth standards, developed by the World Health 
Organization multicentre growth reference study, describe normal child growth 
from birth to 5 years under optimal environmental conditions. These standards 
can be applied to all children everywhere, regardless of ethnicity, socioeconomic 
status and type of feeding®**. 

If necessary, percentile values were transformed to Z-scores by the inverse- 
normal density function. When growth charts were unavailable, we used reported 
LMS parameters (median (M), generalized coefficient of variation (S) and skew- 
ness (L)) to obtain Z-scores via the formula: 


(X/M)i -1 
Z-score = a ae ; 
In(X/M)/S,L=0 


in which X is the observed value. 
In adults (>18 years of age), we estimated LMS parameters when these were 
unavailable from the available sex-, age- and origin-matched Swiss (CoLaus), 
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Estonian or French control populations. For cases identified from population- 
based cohorts, Z-scores were directly inferred from the cohort. 

Gene expression. We established lymphoblastoid cell lines from deletion and 
duplication carriers, as well as from controls (Supplementary Table 12), by trans- 
forming peripheral blood mononuclear cells with Epstein-Barr virus. Patients and 
controls were enrolled after obtaining appropriate informed consent via the physi- 
cians in charge, and approval by the ethics committee of the University of Lausanne. 
More control cell lines were obtained from Coriell Institute for Medical Research 
(http://www.coriell.org/) (Supplementary Table 12). SYBR Green real-time quant- 
itative PCR (RT-PCR) was performed as previously described’*””. Briefly, 1 ug of 
total RNA from lymphoblastoid cell lines was converted to complementary DNA 
using Superscript VILO (Invitrogen) primed with a mixture of oligo(dT) and random 
hexamers. Oligos were designed using the PrimerExpress program (Applied 
Biosystems) with default parameters (Supplementary Table 11). Non-intron- 
spanning assays were tested for genomic contamination in standard + reverse 
transcriptase reactions. The amplification efficiency of each primer pair was tested 
in a cDNA dilution series, as previously described”. A full list of genes mapping in 
the rearranged interval, and exclusion criteria, are presented in Supplementary 
Table 1. All RT-PCR reactions were performed in a 10-1 final volume and tripli- 
cates per sample. The setup in a 384-well plate format was performed using a 
Freedom EVO robot (TECAN) and assays were run in an ABI 7900 sequence 
detection system (Applied Biosystems) with the following amplification condi- 
tions: 50 °C for 2 min, 95 °C for 10 min, and 45 cycles of 95 °C 15s, then 60 °C for 
1 min. A final incubation of 95 °C for 15 s followed by 60 °C for 15 s was carried out 
to establish a dissociation curve. Each plate included the appropriate normaliza- 
tion genes to control for any variability between plate runs. Raw threshold cycles 
(Ct) values were obtained using SDS2.4 (Applied Biosystems). To calculate the 
normalized relative expression ratio of individuals carrying the CNV and of 
controls, we used Biogazelle qBase Plus software” including geNorm®. This 
program identified appropriate normalization genes (EEFIA1, RPL13, GUSB 
and TBP) having a gene-stability measure of M = 0.25. We note that one gene, 
LAT, showed a very high expression profile in one of the duplication samples 
(DASYL, Supplementary Table 13), reaching a relative expression value of 27.3 
(s.e.m. = 1.37), compared to an average expression for other duplications of 1.89 
(s.e.m. = 0.51). We cannot exclude that this finding is genuine (and confirmed it in 
a second experiment), but it was removed from further analyses as an outlier to give 
a more accurate overview of expression profiles for these genes. 

In silico analysis was performed to check for brain, and specifically hypothalamus, 
expression of genes in the rearranged 16p11.2 interval (Supplementary Table 1). This 
was done using Allen Brain Atlas Resources, available from http://www.brain-map. 
org. 

Cases with major neurological signs. Major neurological signs were defined 
by moderate to severe hypotonia, hypertonia, ataxia, spasticity, hypereflexia, 
hyporeflexia and/or extra-pyramidal signs, and by the presence of epilepsy. 
Statistics. Student’s t-test: one-tailed t-tests were performed to test whether 
duplication carriers have Z-score values lower than zero for BMI, height and 
weight. We found this analysis more suitable than linear regression analysis, 
correcting for confounding factors such as sex and age, because these anthro- 
pometric traits have a highly nonlinear dependence on these factors, as can be 
observed in control populations. 

Kruskal-Wallis test: this was used to test differences in the gene expression 
pattern between deletion and duplication carriers and control individuals. 
Because expression values are not necessarily normally distributed, this test is more 
adequate than a classical one-way analysis of variance. To test pairwise differences, 
we computed the difference in mean group rank with its 95% confidence interval 
(as provided by the multcompare function in Matlab). Correction for multiple 
testing was done using a Bonferroni adjustment. 

Multiple testing: we determined false-discovery-rate-based thresholds for asso- 
ciation P-values for each phenotype, to correct for multiple testing. For each 
phenotype, we replaced the observed Z-scores with numbers randomly drawn 
from a standard normal distribution and performed the same f-tests for the same 
strata. The procedure was repeated 1,000 times. For various P-value thresholds, we 
asked how many tests would be declared significant for the null set on average 
(over the 1,000 random draws). The false discovery rate was estimated as the ratio 
of this number and the actual number obtained for the observed Z-scores. Thus, 
we controlled the dependence between nested tests. 

Relative risk: among adults, we defined underweight as a BMI <18.5 (WHO 
criteria). The estimated relative risk is the ratio of the fraction of underweight 
individuals among duplication carriers versus our control group. The standard 
error of log(relative risk) and its significance were calculated as previously 
described*’. In our control group (population-based cohorts), the frequency of 


being underweight is 1.9% (38 males and 148 females out of 9,470). Owing to the 
fact that being underweight decreases with age in the general population, we 
resampled our control group to ensure precise age-matching. 
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Multiple reference genomes and 
transcriptomes for Arabidopsis thaliana 


Xiangchao Gan'*, Oliver Stegle”*, Jonas Behr**, Joshua G. Steffen**, Philipp Drewe**, Katie L. Hildebrand, Rune Lyngsoe®, 
Sebastian J. Schultheiss*, Edward J. Osborne’, Vipin T. Sreedharan*, André Kahles?, Regina Bohnert?, Géraldine Jean’, 
Paul Derwent’, Paul Kersey’, Eric J. Belfield®, Nicholas P. Harberd®, Eric Kemen’, Christopher Toomajian®, Paula X. Kover!°, 


Richard M. Clark*, Gunnar Ritsch*® & Richard Mott! 


Genetic differences between Arabidopsis thaliana accessions underlie the plant’s extensive phenotypic variation, and 
until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report 
the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. 
When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in 
at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding 
potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis 
variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most 
pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional 
studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions. 


Interpreting the consequences of genetic variation has typically relied on 
a reference sequence, relative to which genes and variants are annotated. 
However, this may cause bias, because genes may be inactive in the 
reference but expressed in the population’, suggesting that sequencing 
and re-annotating individual genomes is necessary. Advances in 
sequencing’ make this tractable for Arabidopsis thaliana’, whose 
natural accessions (strains) are typically homozygous. Relative to the 
119-megabase (Mb) high-quality reference sequence from Col-0 
(ref. 6), diverse accessions harbour a single nucleotide polymorphism 
(SNP) about every 200 base pairs (bp) (ref. 3), and indel variation is 
pervasive*”®. Characterizing this variation is crucial for dissecting the 
genetic architecture of traits by quantitative trait locus mapping in 
recombinant inbred lines (see, for example, ref. 9) or genome-wide 
association in natural accessions’. 

Here we have sequenced and accurately assembled the single-copy 
genomes of 18 accessions that, with Col-0, are the parents of more 
than 700 Multiparent Advanced Generation Inter-Cross (MAGIC) 
lines’, similar to the maize Nested Association Mapping (NAM)” 
population and the murine Collaborative Cross'”. These accessions 
comprise a geographically and phenotypically diverse sample across 
the species’. Using the genomes, seedling transcriptomes and com- 
putational gene predictions we have characterized the ancestry, 
polymorphism, gene content and expression profile of the accessions. 
We show that the functional consequences of polymorphisms are 
often difficult to interpret in the absence of gene re-annotation and 
full sequence data. The assembled genomes also contribute to the 
A. thaliana 1001 Genomes Project?>”’. 


Genome sequencing, assembly and variants 


We assembled the 18 genomes so that single-copy loci would be 
contiguous, with less than one assembly error per gene, and therefore 


suitable for annotation. Accessions were sequenced with Illumina 
paired-end reads* (Supplementary Table 1), generally with two 
libraries with 200-bp and 400-bp inserts and reads of 36 and 51 bp, 
respectively, to between 27-fold and 60-fold coverage. Each genome 
was assembled by using five cycles of iterative read mapping’* com- 
bined with de novo assembly’ (Supplementary Information sections 2 
and 3, and Supplementary Tables 1 and 2). We aligned reads to the 
final assemblies to detect polymorphic regions’ lacking read coverage 
(2.1-3.7 Mb per accession; Supplementary Table 3 and Supplemen- 
tary Fig. 2). At unique loci, polymorphic regions probably reflect 
complex polymorphisms**. The average N50 length (the contig size 
such that 50% of the entire assembly is contained in contigs equal to 
or longer than this value) of contiguous read coverage between poly- 
morphic regions was 80.8 kb (Supplementary Table 4). 

To report complex alleles consistently, we defined all variants 
against the multiple alignment consensus of Col-0 and the assembled 
genomes. For each accession there were 497,668-789,187 single-base 
differences from Col-0, and about 45,000 ambiguous nucleotides 
(Supplementary Table 5). The latter may reflect heterozygosity 
(particularly in Po-0; Supplementary Figs 5-7) or copy-number 
variants, and they were largely in transposable elements and repeats 
covering 21.9% of the genome (Supplementary Information section 
5.1, and Supplementary Figs 8 and 9). Of 3.07 million SNPs, 45.2% 
were private to single accessions. 

We identified 1.20 million indels, and 104,090 imbalanced sub- 
stitutions, in which a sequence in Col-0 was replaced by a different 
sequence (Supplementary Tables 3 and 7). Although 57.5% of indels 
or imbalanced substitutions were shorter than 6 bp, 1.9% were longer 
than 100 bp, and overall 14.9 Mb of Col-0 sequence was absent in one 
or more accessions (Fig. 1a and Supplementary Fig. 8). The assemblies 
were about 1.6% and about 4.3% shorter than the reference (including 
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Figure 1 | Assembly and variation of 18 genomes of A. thaliana. 

a, Classification of sequence, SNPs and indels based on the Col-0 genome. 

b, Assembly accuracy (y axis; base substitution errors per 10 kb) measured 
relative to four validation data sets at each of eight stages in the IMR/DENOM 
assembly pipeline (x axis). Bur-0 survey (blue line): 1,442 survey sequences 
(about 417 bp each) in predominantly genic regions’; Bur-0 divergent (red 
line): 188 sequences (each about 254 bp) highly divergent from Col-0 (ref. 3); 
Ler-0 nonrepetitive (orange line): a predominantly single-copy 175-kb Ler-0 
sequence on chromosome 5; Ler-0 repetitive (purple line): a highly repetitive 
339-kb Ler-0 locus on chromosome 3 (ref. 18; Supplementary Information 
section 4). Iter, iteration. c, Genome-wide distribution of the minimum clade 
size for all pairs of accessions (excluding Po-0). Each pair is represented by a 
grey line, the mean over all pairs by the black line and the random distribution 
by the green line. d, Decay in linkage disequilibrium with distance (Po-0 
excluded). The black line shows r* between SNPs; the red line shows 
phylogenetic r* (Supplementary Information section 6). 


and excluding polymorphic regions, respectively), probably reflecting 
limitations in detecting long insertions. Although sequence differ- 
ences were enriched in transposable-element and intergenic regions, 
about 17% of bases deleted in one or more accessions were annotated 
as genic in Col-0 (Fig. la and Supplementary Fig. 8). The density of 
as are differences is greater than between ase inbred strains of 
mice’, but less than between lines of maize’” 

Both iterative and de novo assembly improved accuracy, with the 
latter being most effective at divergent loci (Fig. 1b, Supplementary 
Table 2 and Supplementary Fig. 10). As assessed with about 1.2 Mb of 
genomic dideoxy data*'*"” (Supplementary Information section 4), the 
substitution error rate was about 1 per 10kb in single-copy regions, 
and about tenfold higher in transposable-element-rich regions. 
Further, RNA-seq reads covered about 100,000 SNPs per accession 
with 99.72% concordance (Supplementary Table 5), and junction 
sequences for 66 of 68 (97%) long indels and imbalanced substitutions 
were confirmed by PCR and dideoxy sequencing (Supplementary 
Table 8). The substitution error rate for our assemblies was comparable 
to that reported for four other A. thaliana genome assemblies’. 


Genome-wide patterns of ancestry 


The ancestral relationships of the accessions vary genome-wide. We 
computed phylogenies” across 1.25 million biallelic, non-private SNPs 
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(Supplementary Information section 6). The ancestry of each pair of 
accessions within a phylogeny was quantified by using the genome- 
wide distribution of the minimum clade size of the subphylogeny 
containing the pair (Fig. 1c). Despite their wide geographical origins, 
with the exception of Po-0 and Oy-0, all pairs have distributions 
similar to that of an unstructured sample. The probability of recent 
co-ancestry is slightly higher than expected for a few pairs of accessions, 
with extended haplotype sharing at a minority of loci (Supplemen- 
tary Figs 11-15), perhaps reflecting selective sweeps’. Both linkage 
disequilibrium and correlation between neighbouring phylogenies 
decrease by 50% within 5kb (Fig. 1d and Supplementary Fig. 16). 
Variation among the 18 accessions is similar to a diverse global 
A. thaliana sample’* in nucleotide diversity (Supplementary Figs 11- 
15), correlation with genomic features (Supplementary Tables 9-12) 
and structural variants (Supplementary Fig. 17). 


Gene annotation and transcript diversity 


A naive projection of the coordinates of the 27,206 nuclear protein- 
coding genes from Col-0 (TAIR10 annotation) onto the 18 genomes 
predicted that 93.4% of proteins were changed in at least one acces- 
sion, with 32% of the total being affected by genic deletions, pre- 
mature termination codons, or other disruptions (Supplementary 
Table 13). This large tally of disrupted genes implies that reference 
annotations cannot be transferred reliably; in fact, re-annotation 
reveals compensating changes, ensuring that many genes encode 
apparently functional proteins (Fig. 2a). Thus, in 96.2% of the 8,757 
genes affected, the naive annotations were replaced by an alternative 
gene model in at least one accession (Fig. 2b and Supplementary 
Fig. 18). We predicted new splice sites in 64% of the 2,572 genes with 
splice site disruptions (in 696 cases the new sites were within 30 bp of 
the original ones; see, for example, Fig. 2a). Finally, there was evidence 
of alternative splicing in 2,106 genes (Supplementary Information 
sections 10.10-10.13). 

For genome annotation and expression analyses (for example 
Figs 2-4), we generated 78-bp RNA-seq reads from two biological 
replicates of seedling mRNA (about 9.5 million mapped reads per 
accession, including Col-0; Supplementary Information section 9, 
and Supplementary Table 14). We integrated read alignments” with 
sequence-based gene predictions” by using mGene.ngs (Supplemen- 
tary Information sections 9-10.3, and Supplementary Fig. 19). On 
average, 24,681 coding genes were predicted for each accession. 
Comparison of Col-0 de novo predictions with TAIR10 annotations 
(Supplementary Table 16) showed that these predictions are more 
accurate (transcript F-score 65.2%) than using the genome sequence 
(mGene”, 59.6%) or RNA-seq alignments alone (Cufflinks”’, 37.5%; 
Supplementary Table 17). Finally, we consolidated the de novo anno- 
tations by incorporating TAIR10 annotations where applicable 
(Supplementary Information section 10.4, and Supplementary Fig. 
20); novel transcript structures for a known TAIR10 gene were only 
accepted if each newly predicted intron was confirmed by RNA-seq 
alignments, or if the reference gene model was severely disrupted. 

We found, on average, 42,338 transcripts per accession (excluding 
Col-0), of which 5.5% (2,316) were novel (Table 1 and Supplementary 
Table 18). In each accession there were, on average, 319 novel genes 
(or gene fragments) supported by RNA-seq (Table 1); 717 novel genes 
were found in total, 496 whose sequence was present in Col-0 but not 
annotated, and 221 absent from the Col-0 genome but present in the 
de novo assemblies of the accessions. We found protein or expressed 
sequence tag matches for 74.9% of the new genes, primarily from 
A. thaliana, A. lyrata or other Brassicaceae species (Supplementary 
Information sections 10.8 and 10.9). 

For accession Can-0, we generated additional independent higher 
coverage RNA-seq data from seedling, root and floral bud, which we 
used to confirm 83.3% of re-annotated introns (read alignment over 
splice junction) and 59.9% of transcripts (confirmation of every 
intron, or read coverage of 50% of the transcript for single exon 
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Figure 2 | Transcript and protein variation. a, Example of a splice site change 
between two haplotypes for the gene AT1G64970. Haplotype I (Col-0) is spliced 
with an intron 6 bp (two amino acids) shorter than haplotype II (Ler-0); Po-0 
(heterozygous) shows allele-specific expression of both. b, Re-annotation of the 
FRIGIDA locus showing annotations for accessions Sf-2 (functional), and Col-0 
(truncated by a premature stop) and Ler-0 (non-functional) (Supplementary 
Figs 18 and 42). Right: the 19 accessions are shown clustered on the basis of the 
AA distance between their FRIGIDA amino-acid sequences. Common isoform 
clusters (at distance 2% or less; red line) are shown, leading to three clusters 
with three, seven and nine accessions. c, Proteome diversity for coding genes, 
pseudogenes and A. lyrata genes (top) and for genes with disruptions (bottom). 
Reported is the fraction of genes with relative AA distance to other accessions 
(average over pairs) in the given colour-coded interval (Supplementary 
Information section 10.7). d, Frequency of isoforms of coding genes and 
pseudogenes (top), and those associated with different disruptions (bottom). 


transcripts; Table 1). We also obtained additional RNA-seq data for 
Col-0 and found similar confirmation rates for the reference annota- 
tion (Supplementary Table 19). Moreover, for Can-0 we confirmed 
72.1% and 84.2% of novel introns and transcripts. Many novel introns 
stemmed from splice disruptions that tended to be weakly expressed 
so RNA-seq evidence was scarcer (Supplementary Fig. 22). Finally, 
more than 75% of novel alternative splicing events were supported by 
RNA-seq (Supplementary Information section 10.5). 


Proteome diversity 


To understand the effect of genetic diversity on proteins, it is insuf- 
ficient to study isolated DNA polymorphisms in the context of the 
reference annotation. We therefore defined the distance between two 
amino-acid (AA) sequences by the fraction of amino-acid residues that 
did not align identically in their global alignment. For example, for 
FRIGIDA, between Col-0 and Sf-2, a premature stop codon leads to 
an AA distance of 49% (Fig. 2b). In 77% of proteins, the mean AA 
distance between all accessions was less than 3% (Fig. 2c). However, on 
average, 747 proteins per accession had a distance larger than 50% to 
any TAIR1O0 protein, with markedly greater variation for pseudogenes. 
As expected, variation between A. thaliana and its congener A. lyrata™* 
exceeds that observed among A. thaliana accessions (Fig. 2c and 
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Supplementary Fig. 23). Disruptions to splice sites and translation start 
and stop codons typically caused less severe effects than premature stop 
codons or frame shifts (Fig. 2c) when compensating splice sites created 
alternative in-frame splicing (for example Fig. 2a and Supplementary 
Fig. 24). 

Next, we identified protein isoforms across accessions (Fig. 2b, 
right; distinct isoforms differ by at least roughly 2% AA distance; 
Supplementary Information section 10.7). For 80% of protein coding 
genes the most frequent isoform was very common (frequency at least 
15 out of 19), whereas isoforms for pseudogenes usually occurred at 
lower frequency. Moreover, isoforms for large disruptions were rare 
(frequency 3 or less) for 37% of affected genes (Fig. 2d). This was most 
pronounced for premature stops and frameshifts, where purifying 
selection is expected to be strongest. 

As expected*”, disease resistance genes of the coiled-coil and Toll 
interleukin 1 receptor subfamilies of the Nucleotide-Binding Leucine 
Rich Repeat (NB-LRR) gene family were predicted to encode the most 
variable proteins (Fig. 4a and Supplementary Fig. 26). F-box and 
defensin-like genes implicated in diverse processes including 
defence” were also highly variable. In contrast, housekeeping genes 
showed little variation. 


Variation in seedling gene expression 


Median expression heritability of protein-coding genes was 39%, sim- 
ilar to that of novel genes (36%) and pseudogenes (38%), and more 
than for non-coding RNAs (30%) (Supplementary Fig. 27). In total, 
75% (20,550) of protein-coding genes (and 21% of non-coding RNAs 
and 21% of pseudogenes) were expressed in at least one accession 
(false discovery rate (FDR) 5%), and 46% (9,360) of expressed pro- 
tein-coding genes were differentially expressed between at least one 
pair of accessions” (Fig. 3a; FDR 5%, Supplementary Information 
section 11). Of these, 19% (1,750) had more than tenfold expression 
changes, and 1.5% (142) more than 100-fold (Fig. 3b). For about 60% 
of genes, at least five accessions contributed to expression variation 
(Fig. 4d; Supplementary Information section 11.8). 

Although the small sample size (19) precludes genome-wide asso- 
ciation scans to identify trans expression quantitative trait loci (eQTLs), 
we identified potential cis-acting nucleotide variants, copy-number 
variants and gene structural variants (for example large indels and gene 
structure changes) associated with expression for 9% (836) of differ- 
entially expressed genes (FDR 5%; Supplementary Information section 
12.2; we assessed gene-copy-number variation as in Supplementary 
Information section 12.4). Much of this variation was highly heritable 
(Fig. 3a). Consistent with identifying likely causal variants, 85% and 
93% of associated SNPs and single-nucleotide indels for cis-eQTLs were 
within 5 and 10 kb of the gene, respectively, and were strikingly con- 
centrated in the 100-bp promoter region and 5’ genic sequences 
(Fig. 3c, d). This was also true for heritable intron retention events, in 
which most cis associations were within the intron or less than 1 kb 
distant (Supplementary Fig. 32). Our results corroborate the general 
findings**’ of extensive cis regulation of gene expression in A. thaliana. 
Neither environmental variation nor population structure markedly 
affected expression variation (Supplementary Information section 13). 
Copy-number and structural variants were associated with expression in 
3% (240) of differentially expressed genes, including 45% (64 out of 142) 
of genes with more than 100-fold differences (Fig. 3b), consistent with 
array studies”’. 

Differential gene expression varied by gene ontology (GO) and 
gene family (Fig. 4b-d, Supplementary Table 24 and Supplementary 
Figs 39-41). Seventeen of the 18 GO classifications that were enriched 
for differential expression (P<10 *) concerned response to the 
biotic environment, including pathogen defence and the production 
of glucosinolates”? to deter herbivores (Supplementary Table 24). 
These include NB-LRR genes (echoing protein variation), of which 
74% were differentially expressed at up to 400-fold change, and for 
which many accessions typically contributed to differential expression 
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Figure 3 | Quantitative variation of coding gene 
expression. a, The overlap between heritable (more 
than 30%) and differentially expressed (FDR 5%) 
genes, and genes with a cis-eQTL (FDR 5%). 

b, Differentially expressed genes and genes with cis- 
eQTLs (FDR 5%) categorized by fold change. 
Nucleotide variants (orange bars; 647 cis-eQTLs) 
are SNPs and single-base indels; copy-number 
variants (green bars; 42 cis-eQTLs) are regions with 
elevated coverage in aligned genomic reads in at 
least one accession; gene structural variants (black 
bars; 227 cis-eQTLs) are accession-specific 
deletions, insertions or changes to the gene model. 
c, The spatial distribution of nucleotide-variant 
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(Fig. 4b-d). Patterns for housekeeping genes (such as ribosomal 
proteins, eukaryotic initiation factors or kinesins) were markedly differ- 
ent: although many were differentially expressed, fold changes were 
generally small, with variation more often being limited to a few acces- 
sions (Fig. 4b-d). Differentially expressed genes generally had much 
higher nucleotide diversity at synonymous sites relative to other 
expressed genes, a pattern also observed but less extreme at non- 
synonymous sites (Supplementary Table 25). This suggests that differ- 
ences in expression level were not due solely to reduced selective 
constraint. 

The type II MADS box transcription factor family*’ showed strik- 
ing expression polymorphisms (Fig. 4b-d), including for the 
FLOWERING LOCUS C_ (FLC)** and MADS AFFECTING 
FLOWERING (MAF) genes*’. FLC, a floral inhibitor expressed highly 
in accessions that require prolonged cold (vernalization) to flower’, 
varied more than 400-fold (Supplementary Fig. 42). F-box and defen- 
sin-like genes were exceptional in that expression was restricted in a 
minority of genes (41% and 12%, respectively; Fig. 4b), perhaps 
reflecting tissue-specific or environment-specific expression*>*”. 


Position relative to gene structure 


eQTLs relative to the start of protein-coding genes 
(FDR 5%, overlapping genes removed; n = 647). 
The line shows density of gene length. 

d, Frequencies of nucleotide-variant eQTLs in 
protein-coding genes, classified by component (bar 
widths are proportional to the components’ average 
physical lengths): red bars, upstream; yellow bars, 5’ 
untranslated region; green bars, coding sequence 
exons; blue bars, introns; cyan bars, 3’ untranslated 
region; grey bars, downstream. 


+100 bp +600 bp 


Our data suggest that high turnover for some F-box families in the 
A. thaliana lineage’ extends to gene expression as well. 


Conclusion 


Our study goes beyond cataloguing polymorphisms””” to provide 
genome sequences for a moderately sized population sample (see also 
refs 4, 16). In doing so, we were able to annotate each genome largely 
independently of the Col-0 reference. We found that disruptive poly- 
morphisms were frequently compensated for, thereby conserving 
coding potential and highlighting the limitation of inferring conse- 
quences of polymorphisms in the absence of complete sequence data. 

Our assemblies are accurate and largely complete in single-copy 
regions, although additional work will be needed to assemble the 
roughly 20% of the genome comprising repeats and transposable 
elements. Disentangling copy variation, long insertions and other 
genomic rearrangements remains a challenge. The methods we 
developed are of immediate relevance to the broader A. thaliana 
1001 Genomes Project’ and to other organisms, and highlight the 
importance of RNA-seq data for annotation. 
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Figure 4 | Protein diversity and gene expression vary by gene category or 
family. The numbers next to each row are gene counts. The gene families were 
selected from Supplementary Figs 26 and 39-41 to represent the breadth of 
observed variation. a, Distribution of average AA distances to other accessions 
(compare with Fig. 2c). b, Fraction of unexpressed, expressed and differentially 
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expressed genes (expressed is a superset of differentially expressed). 

c, Distribution of genes categorized by fold change (between lowest and highest 
across 19 accessions). d, Distribution of the numbers of accessions contributing 
to differential expression. TF, transcription factor; CC, coiled-coil; TIR, Toll 
interleukin-1 receptor; NB-LRR, nucleotide-binding leucine-rich repeat. 
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Table 1 | Summary of gene predictions 


Total Novel 
Type Per accession RNA-seq Per accession RNA-seq 
confirmed (%) confirmed (%) 
Genes 33,197 62.7 319 88.4 
Transcripts 42,338 59.9 2,316 84.2 
Introns 127,640 83.3 1,345 72.1 
Start codons 33,264 n.a. 503 na. 
Stop codons 33,720 n.a. 528 na. 
Intron retentions 1,192 78.1 873 76.5 
Exon skips 80 80.5 38 767 


‘Total’ and ‘novel’ are average counts over all 19 accessions. ‘RNA-seq confirmed’ gives the percentage 
fully confirmed using independent RNA-seq data (three tissues) for Can-O, the most divergent accession. 


Finally, despite using only 19 accessions, we fine-mapped cis- 
eQTLs to small genomic regions (less than 10kb), suggesting that 
analogous genome-wide scans in the more than 700 derived 
MAGIC lines could have single-gene mapping resolution for some 
loci. Our findings indicate that the MAGIC lines, for which popu- 
lation structure is largely mitigated’, will be an important and com- 
plementary resource to genome-wide association studies in 
A. thaliana populations”. 


METHODS SUMMARY 


We used the same seed stocks for Col-0 and the 18 accessions Bur-0, Can-0, Ct-1, 
Edi-0, Hi-0, Kn-0, Ler-0, Mt-0, No-0, Po-0, Oy-0, Rsch-4, Sf-2, Tsu-0, Wil-2, Ws- 
0, Wu-0 and Zu-0 that originated the MAGIC lines. DNA and RNA sequencing 
was performed with standard (DNA) or modified (RNA-seq) Illumina protocols. 
All methods are described fully in Supplementary Methods; software is available 
from the authors on request. 
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